Skip to content

Raise an error on redundant definition of separator in read_csv #39823

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
malinkallen opened this issue Feb 15, 2021 · 4 comments · Fixed by #41146
Closed

Raise an error on redundant definition of separator in read_csv #39823

malinkallen opened this issue Feb 15, 2021 · 4 comments · Fixed by #41146
Assignees
Labels
Enhancement Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Milestone

Comments

@malinkallen
Copy link

Is your feature request related to a problem?

When calling read_csv and specifying both sep and delim_whitespace, or both delimiter and delim_whitespace I get a ValueError in pandas 1.3.0.dev0+713.g9f792cd903. For example:
df = pd.read_csv("my_data.csv", sep=' ', delim_whitespace=True)
and
df = pd.read_csv("my_data.csv", delimiter=' ', delim_whitespace=True)
give an error. However, when I specify both sep and delimiter, for example:
df = pd.read_csv("my_data.csv", sep=' ', delimiter='.')
sep is just silently ignored. I think it would make sense to raise an ValueError in this case as well.

Moreover, the error that is raised today gives the message Specified a delimiter with both sep and delim_whitespace=True; you can only specify one. regardless of whether I specify sep or delimiter together with delim_whitespace. I think it should be changed to Specified a delimiter with both delimiter and delim_whitespace=True; you can only specify one. when delimiter is used.

Describe the solution you'd like

Raise a ValueError when both sep and delimiter are used to specify the separator for read_csv.

Change the message "Specified a delimiter with both sep and delim_whitespace=True; you can only specify one." to "Specified a delimiter with both delimiter and delim_whitespace=True; you can only specify one." when both delimiter and delim_whitespace are specified.

API breaking implications

This will "break" code that specify both sep and delimiter. However, it is consistent with the behavior when you specify one of those parameters together with delim_whitespace. Moreover, a similar change has been done at some time between pandas 0.25.3 (the latest version provided by aptitude) and the development version. In the former
pd.read_csv("my_data.csv", delim_whitespace=True, sep=',')
doesn't cause a ValueError, but in the latter it does.

Describe alternatives you've considered

An alternative could be to issue a warning instead of an error, but an error is more consistent with the current behavior for the combination of delim_whitespace and (sep or delimiter).

@malinkallen malinkallen added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 15, 2021
@jreback
Copy link
Contributor

jreback commented Feb 15, 2021

it's possible this was not explicitly checked before

a PR to fix this would be fine

@lithomas1 lithomas1 added Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Feb 15, 2021
@lithomas1 lithomas1 added this to the Contributions Welcome milestone Feb 15, 2021
@0xpranjal
Copy link
Contributor

@malinkallen Are you working on this issue, Or should I take it up?

@malinkallen
Copy link
Author

@Bhard27 You can take it up if you want!

@0xpranjal
Copy link
Contributor

@jreback Please assign this issue to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Error Reporting Incorrect or improved errors from pandas IO CSV read_csv, to_csv
Projects
None yet
4 participants