-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Add support of 'decimal' option to Series.to_csv and Dataframe.to_csv #8448
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
if decimal != '.': | ||
imask = (~mask).ravel() | ||
values.flat[imask] = np.array( | ||
[val.replace('.',',',1) for val in values.ravel()[imask]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you write this in a way that avoids redundant code? e.g., define formatter = lambda x: x.replace('.', decimal, 1)
depending on the desired formatting and then use the function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New proposal submited
|
||
|
||
if float_format and decimal != '.': | ||
formater = lambda v : (float_format % v).replace('.',decimal,1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you actually want to create a new formatter, but NOT in the lambda, e.g.
if float_format and decimal != '.':
f = float_format.replace('.',decimal,1)
formatter = lambda v: f % v
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand you point.
If the float_format specified is something like '%.3f' just replacing the '.' by a ',' will lead to an uncorrect formater.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you construt the integer/decimal portion separatly then you can do it (you might have to 'parse' the formatter a bit), somethign like %d%s%d
(but then you'll have to do some int truncation and such).
nvm. your original is prob ok (though will be slow)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest to keep this first version as such and I will submit another pull request if I'm able to find a better solution (ideally we should also handle the case where the float_format is a class accepting the '%' operator).
The slowness is real but will be close to what people are currently doing in (iterating over the final result to replace all '.' by ',')
along those lines, would be nice to add a vbench for this as well (and then you'll know!) |
@jreback Can this be revisited? Is there something that needs to be changed by @bertrandhaut (apart from adding a vbench) |
I think this needs a test? |
As suggested, I've added a test for this new option. |
@@ -1126,6 +1126,8 @@ def to_csv(self, path_or_buf=None, sep=",", na_rep='', float_format=None, | |||
date_format : string, default None | |||
Format string for datetime objects | |||
cols : kwarg only alias of columns [deprecated] | |||
decimal: string, default '.' | |||
Character recognized as decimal separator. E.g. use ‘,’ for European data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use standard single quotes here? '
instead of ‘
Joris' comments taken into account |
@jreback Is this good to go? (not really familiar with this) @bertrandhaut Before merging, we ask to squash the commits into one. Can you do that (https://github.com/pydata/pandas/wiki/Using-Git#fetch-and-then-rebase-interactively-to-squash-reword-fix-otherwise-change-some-commits, if you have questions, just ask!) |
|
||
|
||
def test_to_csv_decimal(self): | ||
df = DataFrame({'col1' : [1], 'col2' : ['a'], 'col3' : [10.1] }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the issue number as a comment here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried to squash the commits as described but I faced the following error:
$git rebase pandas/master
fatal: Needed a single revision
invalid upstream pandas/master
Any idea ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try this:
git checkout master
git pull pandas/master master
git checkout your-branch
git rebase master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be something special that I do to create my local version...
When I do the command "git checkout master" in my to level directory I get
error: pathspec 'master' did not match any file(s) known to git.
If I do it in the "pandas" directory (the one with the setup.py file) I get:
Already on 'master'
Your branch is up-to-date with 'origin/master'.
but then the second command git pull pandas/master master lead to
fatal: 'pandas/master' does not appear to be a git repository
fatal : Could not read from remote repository
By the way, in my repository, I don't think I've made any branch. I've done
everything in master. Maybe was that not a good idea ?
On Tue, Mar 3, 2015 at 9:31 AM, Stephan Hoyer [email protected]
wrote:
In pandas/tests/test_format.py
#8448 (comment):@@ -2343,7 +2343,21 @@ def test_csv_to_string(self):
df = DataFrame({'col' : [1,2]})
expected = ',col\n0,1\n1,2\n'self.assertEqual(df.to_csv(), expected)
- def test_to_csv_decimal(self):
df = DataFrame({'col1' : [1], 'col2' : ['a'], 'col3' : [10.1] })
Try this:
git checkout master
git pull pandas/master master
git checkout your-branch
git rebase master—
Reply to this email directly or view it on GitHub
https://github.com/pydata/pandas/pull/8448/files#r25670074.
Bertrand Haut
@bertrandhaut can you do a vbench run on the csv's to make sure that perf is unchanged, see here:https://github.com/pydata/pandas/wiki/Performance-Testing |
For the vbench I tried but without success. For information I've only python3 on my computer and it seems that vbench is still not ported to python3. |
About the vbench, I think that is correct that is not yet ported to python 3. |
@bertrandhaut can you rebase and squash to a single commit. looks ok otherwise to me. |
@bertrandhaut You were indeed working on master. In the future, better to always first make a branch if you are going to work on a certain feature/fix. But for this PR, it is OK. Given you are working on master, this should work to rebase/squash:
|
merged via 671c4b3 thanks! |
closes #781
The 'decimal' option exists for read_csv method but not yet in 'to_csv' methods.
The lack of this option is particulary painful when we have to work with Excel with European regional settings.
This modification add this option to both Series.to_csv and Dataframe.to_csv.