Skip to content

Add support of 'decimal' option to Series.to_csv and Dataframe.to_csv #8448

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from

Conversation

bertrandhaut
Copy link
Contributor

closes #781

The 'decimal' option exists for read_csv method but not yet in 'to_csv' methods.
The lack of this option is particulary painful when we have to work with Excel with European regional settings.

This modification add this option to both Series.to_csv and Dataframe.to_csv.

@jreback jreback added IO CSV read_csv, to_csv IO Data IO issues that don't fit into a more specific label Output-Formatting __repr__ of pandas objects, to_string labels Oct 3, 2014
@jreback jreback added this to the 0.15.1 milestone Oct 3, 2014
if decimal != '.':
imask = (~mask).ravel()
values.flat[imask] = np.array(
[val.replace('.',',',1) for val in values.ravel()[imask]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you write this in a way that avoids redundant code? e.g., define formatter = lambda x: x.replace('.', decimal, 1) depending on the desired formatting and then use the function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New proposal submited



if float_format and decimal != '.':
formater = lambda v : (float_format % v).replace('.',decimal,1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you actually want to create a new formatter, but NOT in the lambda, e.g.

if float_format and decimal != '.':
            f = float_format.replace('.',decimal,1)
            formatter = lambda v: f % v

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand you point.

If the float_format specified is something like '%.3f' just replacing the '.' by a ',' will lead to an uncorrect formater.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you construt the integer/decimal portion separatly then you can do it (you might have to 'parse' the formatter a bit), somethign like %d%s%d (but then you'll have to do some int truncation and such).

nvm. your original is prob ok (though will be slow)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to keep this first version as such and I will submit another pull request if I'm able to find a better solution (ideally we should also handle the case where the float_format is a class accepting the '%' operator).

The slowness is real but will be close to what people are currently doing in (iterating over the final result to replace all '.' by ',')

@jreback
Copy link
Contributor

jreback commented Oct 9, 2014

along those lines, would be nice to add a vbench for this as well (and then you'll know!)
see here

@jorisvandenbossche
Copy link
Member

@jreback Can this be revisited?

Is there something that needs to be changed by @bertrandhaut (apart from adding a vbench)

@shoyer
Copy link
Member

shoyer commented Feb 9, 2015

I think this needs a test?

@bertrandhaut
Copy link
Contributor Author

As suggested, I've added a test for this new option.

@@ -1126,6 +1126,8 @@ def to_csv(self, path_or_buf=None, sep=",", na_rep='', float_format=None,
date_format : string, default None
Format string for datetime objects
cols : kwarg only alias of columns [deprecated]
decimal: string, default '.'
Character recognized as decimal separator. E.g. use ‘,’ for European data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use standard single quotes here? ' instead of

@bertrandhaut
Copy link
Contributor Author

Joris' comments taken into account

@jorisvandenbossche
Copy link
Member

@jreback Is this good to go? (not really familiar with this)

@bertrandhaut Before merging, we ask to squash the commits into one. Can you do that (https://github.com/pydata/pandas/wiki/Using-Git#fetch-and-then-rebase-interactively-to-squash-reword-fix-otherwise-change-some-commits, if you have questions, just ask!)



def test_to_csv_decimal(self):
df = DataFrame({'col1' : [1], 'col2' : ['a'], 'col3' : [10.1] })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number as a comment here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to squash the commits as described but I faced the following error:
$git rebase pandas/master
fatal: Needed a single revision
invalid upstream pandas/master

Any idea ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try this:

git checkout master
git pull pandas/master master
git checkout your-branch
git rebase master

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be something special that I do to create my local version...

When I do the command "git checkout master" in my to level directory I get
error: pathspec 'master' did not match any file(s) known to git.

If I do it in the "pandas" directory (the one with the setup.py file) I get:
Already on 'master'
Your branch is up-to-date with 'origin/master'.
but then the second command git pull pandas/master master lead to
fatal: 'pandas/master' does not appear to be a git repository
fatal : Could not read from remote repository

By the way, in my repository, I don't think I've made any branch. I've done
everything in master. Maybe was that not a good idea ?

On Tue, Mar 3, 2015 at 9:31 AM, Stephan Hoyer [email protected]
wrote:

In pandas/tests/test_format.py
#8448 (comment):

@@ -2343,7 +2343,21 @@ def test_csv_to_string(self):
df = DataFrame({'col' : [1,2]})
expected = ',col\n0,1\n1,2\n'

self.assertEqual(df.to_csv(), expected)

  • def test_to_csv_decimal(self):
  •    df = DataFrame({'col1' : [1], 'col2' : ['a'], 'col3' : [10.1] })
    

Try this:

git checkout master
git pull pandas/master master
git checkout your-branch
git rebase master


Reply to this email directly or view it on GitHub
https://github.com/pydata/pandas/pull/8448/files#r25670074.

Bertrand Haut

@jreback
Copy link
Contributor

jreback commented Mar 1, 2015

@bertrandhaut can you do a vbench run on the csv's to make sure that perf is unchanged, see here:https://github.com/pydata/pandas/wiki/Performance-Testing

@bertrandhaut
Copy link
Contributor Author

For the vbench I tried but without success. For information I've only python3 on my computer and it seems that vbench is still not ported to python3.

@jorisvandenbossche
Copy link
Member

About the vbench, I think that is correct that is not yet ported to python 3.

@jreback
Copy link
Contributor

jreback commented Mar 5, 2015

@bertrandhaut can you rebase and squash to a single commit. looks ok otherwise to me.

@jorisvandenbossche
Copy link
Member

@bertrandhaut You were indeed working on master. In the future, better to always first make a branch if you are going to work on a certain feature/fix. But for this PR, it is OK.

Given you are working on master, this should work to rebase/squash:

git fetch upstream
git rebase -i upstream/master

@jreback
Copy link
Contributor

jreback commented Mar 7, 2015

merged via 671c4b3

thanks!

@jreback jreback closed this Mar 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO CSV read_csv, to_csv IO Data IO issues that don't fit into a more specific label Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle European decimal formats in to_csv
4 participants