Skip to content

read_csv/to_csv sep/delimiter inconsistency #7662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dsm054 opened this issue Jul 3, 2014 · 19 comments
Closed

read_csv/to_csv sep/delimiter inconsistency #7662

dsm054 opened this issue Jul 3, 2014 · 19 comments
Labels
API Design IO CSV read_csv, to_csv

Comments

@dsm054
Copy link
Contributor

dsm054 commented Jul 3, 2014

related #7615

Bit of a UI problem here (although it's behaving as the docstring says it does, so it doesn't quite qualify as a bug):

>>> df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6]})
>>> df.to_csv("tmp.csv", sep=";")
>>> !cat tmp.csv
;A;B
0;1;4
1;2;5
2;3;6
>>> df.to_csv("tmp.csv", delimiter=";")
>>> !cat tmp.csv
,A,B
0,1,4
1,2,5
2,3,6

read_csv accepts both sep and delimiter but to_csv silently ignores delimiter. Someone was recently tripped up by this on SO. I'm fine with either teaching to_csv to behave the same way read_csv does or, alternatively, raising if delimiter is found as a keyword.

That is, I'm less bothered by the inconsistency than the silent unexpected behaviour.

@jreback
Copy link
Contributor

jreback commented Jul 3, 2014

related to #7615

to_csv allows **kwargs
I think we need to validate them (eg provide a nice message that you have passed an invalid parameter )

and then can do things like accept delimiter for sep

(must be a reason to_csv accepts **kwargs but don't remember why/when was changed)

@jreback jreback added this to the 0.15.0 milestone Jul 3, 2014
@jreback jreback modified the milestones: 0.16.0, Next Major Release Mar 6, 2015
@indera
Copy link

indera commented Oct 17, 2016

👍 Spent about an hour trying to figure out why passing "delimiter" does not affect the "to_csv" call ...

@indera
Copy link

indera commented Oct 18, 2016

Will try to fix this if nobody started.

Relevant code:

to_csv =>

def to_csv(self, path_or_buf=None, sep=",", na_rep='', float_format=None,

read_csv => https://github.com/pandas-dev/pandas/blob/master/pandas/io/parsers.py#L565

@gfyoung
Copy link
Member

gfyoung commented Oct 24, 2016

@indera : Just what are you trying to fix exactly? The **kw option has been removed in to_csv, so are you trying to add delimiter to to_csv?

@jreback
Copy link
Contributor

jreback commented Oct 24, 2016

@dsm054 @gfyoung is right, this used to be a problem, but no-longer.

@jreback jreback closed this as completed Oct 24, 2016
@jreback jreback modified the milestones: No action, Next Major Release Oct 24, 2016
@indera
Copy link

indera commented Oct 26, 2016

Yes, this is fixed in pandas==0.19.0

I get an error as expected:

TypeError: to_csv() got an unexpected keyword argument 'delimiter'

@indera
Copy link

indera commented Oct 26, 2016

@gfyoung - we should probably now remove support for "delimiter" in read_csv for consistency ;)

@jorisvandenbossche
Copy link
Member

@indera the difference is that for read_csv the delimeter keyword actually works, so not sure if deprecating/removing is worth it. For to_csv it never worked, so that is another situation.

@gfyoung
Copy link
Member

gfyoung commented Oct 26, 2016

@jorisvandenbossche : It wouldn't be that hard to incorporate into the to_csv signature though, so that's one way to go about it to make things consistent.

@jorisvandenbossche
Copy link
Member

Consistency is nice, but the recommended one sep is already consistent for both, so IMO not needed to make the duplication consistent :-)

@gfyoung
Copy link
Member

gfyoung commented Oct 26, 2016

That's fair, though we could consider deprecating delimiter at some point, just not now...:smile:

@indera
Copy link

indera commented Oct 26, 2016

@gfyoung deprecating delimiter sounds good... removing code in general feels good 😄

@gfyoung
Copy link
Member

gfyoung commented Oct 26, 2016

@jorisvandenbossche @jreback : Thoughts?

@dahlbaek
Copy link
Contributor

@gfyoung wrote:

That's fair, though we could consider deprecating delimiter at some point, just not now...

Any update on this? Is 'not now' now?

@gfyoung
Copy link
Member

gfyoung commented Jul 17, 2018

@jorisvandenbossche @jreback : Thoughts?

@dahlbaek
Copy link
Contributor

At the moment, delimiter seems to be silently overriding sep if both are provided, so something should be done I guess.

@jandren
Copy link

jandren commented Sep 9, 2020

The above pull request was closed due to stalling and never merged. The inconsistency remains, which I just encountered. Used delimiter on read_csv and then got errors when I later tried to use to_csv with delimiter again.

@jreback
Copy link
Contributor

jreback commented Sep 9, 2020

@jandren happy to take a new PR to deprecate

@jandren
Copy link

jandren commented Sep 9, 2020

Thanks for the opportunity @jreback! My intention was to just give updated feedback and poke the issue as still alive. I have to little knowledge about the pros and cons of sep vs delimiter, so I will leave it to people familiar with the code base.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design IO CSV read_csv, to_csv
Projects
None yet
Development

No branches or pull requests

7 participants