Add support of 'decimal' option to Series.to_csv and Dataframe.to_csv #8448

bertrandhaut · 2014-10-03T08:08:26Z

closes #781

The 'decimal' option exists for read_csv method but not yet in 'to_csv' methods.
The lack of this option is particulary painful when we have to work with Excel with European regional settings.

This modification add this option to both Series.to_csv and Dataframe.to_csv.

shoyer · 2014-10-08T04:38:26Z

pandas/core/internals.py

+        if decimal != '.':
+            imask = (~mask).ravel()
+            values.flat[imask] = np.array(
+                [val.replace('.',',',1) for val in values.ravel()[imask]])


can you write this in a way that avoids redundant code? e.g., define formatter = lambda x: x.replace('.', decimal, 1) depending on the desired formatting and then use the function

New proposal submited

jreback · 2014-10-09T17:18:56Z

pandas/core/internals.py

+
+
+        if float_format and decimal != '.':
+            formater = lambda v : (float_format % v).replace('.',decimal,1)


you actually want to create a new formatter, but NOT in the lambda, e.g.

if float_format and decimal != '.': f = float_format.replace('.',decimal,1) formatter = lambda v: f % v

I don't understand you point.

If the float_format specified is something like '%.3f' just replacing the '.' by a ',' will lead to an uncorrect formater.

if you construt the integer/decimal portion separatly then you can do it (you might have to 'parse' the formatter a bit), somethign like %d%s%d (but then you'll have to do some int truncation and such).

nvm. your original is prob ok (though will be slow)

I would suggest to keep this first version as such and I will submit another pull request if I'm able to find a better solution (ideally we should also handle the case where the float_format is a class accepting the '%' operator).

The slowness is real but will be close to what people are currently doing in (iterating over the final result to replace all '.' by ',')

jreback · 2014-10-09T18:54:42Z

along those lines, would be nice to add a vbench for this as well (and then you'll know!)
see here

jorisvandenbossche · 2015-02-09T10:06:42Z

@jreback Can this be revisited?

Is there something that needs to be changed by @bertrandhaut (apart from adding a vbench)

shoyer · 2015-02-09T10:21:49Z

I think this needs a test?

bertrandhaut · 2015-02-23T08:20:40Z

As suggested, I've added a test for this new option.

jorisvandenbossche · 2015-02-23T09:22:11Z

pandas/core/frame.py

@@ -1126,6 +1126,8 @@ def to_csv(self, path_or_buf=None, sep=",", na_rep='', float_format=None,
        date_format : string, default None
            Format string for datetime objects
        cols : kwarg only alias of columns [deprecated]
+        decimal: string, default '.'
+            Character recognized as decimal separator. E.g. use ‘,’ for European data


Can you use standard single quotes here? ' instead of ‘

bertrandhaut · 2015-02-23T10:07:04Z

Joris' comments taken into account

jorisvandenbossche · 2015-03-01T13:46:46Z

@jreback Is this good to go? (not really familiar with this)

@bertrandhaut Before merging, we ask to squash the commits into one. Can you do that (https://github.com/pydata/pandas/wiki/Using-Git#fetch-and-then-rebase-interactively-to-squash-reword-fix-otherwise-change-some-commits, if you have questions, just ask!)

jreback · 2015-03-01T15:46:42Z

pandas/tests/test_format.py

-
+
+    def test_to_csv_decimal(self):
+        df = DataFrame({'col1' : [1], 'col2' : ['a'], 'col3' : [10.1] })


add the issue number as a comment here

I've tried to squash the commits as described but I faced the following error:
$git rebase pandas/master
fatal: Needed a single revision
invalid upstream pandas/master

Any idea ?

Try this:

git checkout master git pull pandas/master master git checkout your-branch git rebase master

There should be something special that I do to create my local version...

When I do the command "git checkout master" in my to level directory I get
error: pathspec 'master' did not match any file(s) known to git.

If I do it in the "pandas" directory (the one with the setup.py file) I get:
Already on 'master'
Your branch is up-to-date with 'origin/master'.
but then the second command git pull pandas/master master lead to
fatal: 'pandas/master' does not appear to be a git repository
fatal : Could not read from remote repository

By the way, in my repository, I don't think I've made any branch. I've done
everything in master. Maybe was that not a good idea ?

On Tue, Mar 3, 2015 at 9:31 AM, Stephan Hoyer [email protected]
wrote:

In pandas/tests/test_format.py
#8448 (comment):

@@ -2343,7 +2343,21 @@ def test_csv_to_string(self):
df = DataFrame({'col' : [1,2]})
expected = ',col\n0,1\n1,2\n'

self.assertEqual(df.to_csv(), expected)

def test_to_csv_decimal(self):

df = DataFrame({'col1' : [1], 'col2' : ['a'], 'col3' : [10.1] })

Try this:

git checkout master
git pull pandas/master master
git checkout your-branch
git rebase master

—
Reply to this email directly or view it on GitHub
https://github.com/pydata/pandas/pull/8448/files#r25670074.

Bertrand Haut

jreback · 2015-03-01T15:47:28Z

@bertrandhaut can you do a vbench run on the csv's to make sure that perf is unchanged, see here:https://github.com/pydata/pandas/wiki/Performance-Testing

bertrandhaut · 2015-03-03T08:24:44Z

For the vbench I tried but without success. For information I've only python3 on my computer and it seems that vbench is still not ported to python3.

jorisvandenbossche · 2015-03-03T08:50:51Z

About the vbench, I think that is correct that is not yet ported to python 3.

jreback · 2015-03-05T23:42:28Z

@bertrandhaut can you rebase and squash to a single commit. looks ok otherwise to me.

jorisvandenbossche · 2015-03-06T09:15:33Z

@bertrandhaut You were indeed working on master. In the future, better to always first make a branch if you are going to work on a certain feature/fix. But for this PR, it is OK.

Given you are working on master, this should work to rebase/squash:

git fetch upstream
git rebase -i upstream/master

jreback · 2015-03-07T00:02:58Z

merged via 671c4b3

thanks!

bertrandhaut added 2 commits October 3, 2014 08:30

to_csv: decimal support

0fbb0bf

remove numpy directory

c0985ec

bertrandhaut mentioned this pull request Oct 3, 2014

Handle European decimal formats in to_csv #781

Closed

jreback added IO CSV read_csv, to_csv IO Data IO issues that don't fit into a more specific label Output-Formatting __repr__ of pandas objects, to_string labels Oct 3, 2014

jreback added this to the 0.15.1 milestone Oct 3, 2014

shoyer reviewed Oct 8, 2014
View reviewed changes

Reformating internals.py

4261ea5

jreback reviewed Oct 9, 2014
View reviewed changes

Test for to_csv decimal separator option

e410065

jorisvandenbossche reviewed Feb 23, 2015
View reviewed changes

Joris' comments

f129b0c

rmorgans mentioned this pull request Mar 1, 2015

API: read_csv,from_csv/to_csv keyword consistency #9568

Closed

5 tasks

jreback reviewed Mar 1, 2015
View reviewed changes

issue number as comment

f9a3e45

jreback closed this Mar 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support of 'decimal' option to Series.to_csv and Dataframe.to_csv #8448

Add support of 'decimal' option to Series.to_csv and Dataframe.to_csv #8448

bertrandhaut commented Oct 3, 2014

shoyer Oct 8, 2014

bertrandhaut Oct 9, 2014

jreback Oct 9, 2014

bertrandhaut Oct 9, 2014

jreback Oct 9, 2014

bertrandhaut Oct 9, 2014

jreback commented Oct 9, 2014

jorisvandenbossche commented Feb 9, 2015

shoyer commented Feb 9, 2015

bertrandhaut commented Feb 23, 2015

jorisvandenbossche Feb 23, 2015

bertrandhaut commented Feb 23, 2015

jorisvandenbossche commented Mar 1, 2015

jreback Mar 1, 2015

bertrandhaut Mar 3, 2015

bertrandhaut Mar 3, 2015

shoyer Mar 3, 2015

bertrandhaut Mar 3, 2015

jreback commented Mar 1, 2015

bertrandhaut commented Mar 3, 2015

jorisvandenbossche commented Mar 3, 2015

jreback commented Mar 5, 2015

jorisvandenbossche commented Mar 6, 2015

jreback commented Mar 7, 2015



		if float_format and decimal != '.':
		formater = lambda v : (float_format % v).replace('.',decimal,1)



		def test_to_csv_decimal(self):
		df = DataFrame({'col1' : [1], 'col2' : ['a'], 'col3' : [10.1] })

Add support of 'decimal' option to Series.to_csv and Dataframe.to_csv #8448

Add support of 'decimal' option to Series.to_csv and Dataframe.to_csv #8448

Conversation

bertrandhaut commented Oct 3, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Oct 9, 2014

jorisvandenbossche commented Feb 9, 2015

shoyer commented Feb 9, 2015

bertrandhaut commented Feb 23, 2015

Choose a reason for hiding this comment

bertrandhaut commented Feb 23, 2015

jorisvandenbossche commented Mar 1, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

self.assertEqual(df.to_csv(), expected)

jreback commented Mar 1, 2015

bertrandhaut commented Mar 3, 2015

jorisvandenbossche commented Mar 3, 2015

jreback commented Mar 5, 2015

jorisvandenbossche commented Mar 6, 2015

jreback commented Mar 7, 2015