Skip to content

read_csv: Warn when too many names are specified #40449

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
malinkallen opened this issue Mar 15, 2021 · 3 comments
Closed

read_csv: Warn when too many names are specified #40449

malinkallen opened this issue Mar 15, 2021 · 3 comments
Labels
Duplicate Report Duplicate issue or pull request IO CSV read_csv, to_csv

Comments

@malinkallen
Copy link

Is your feature request related to a problem?

When you call read_csv and the length of names is larger than the number of columns in the data, extra column(s) containing only NaN will be added to the DataFrame created. I think that, generally, the extra name(s) is/are a sign of a bug in the program and/or input data that does not look as the user expects. In both cases, I think it could be helpful to issue a warning to make the user aware of the issue. Moreover, a warning may help avoiding future errors, for example if the user filters or merges data on the column containing only NaNs, thinking that it contains actual data.

Describe the solution you'd like

I think it would be helpful if read_csv issued a warning when the number of elements in names is larger than the number of columns in the data.

API breaking implications

If only a warning is issued, I don't see how this could break anything.

Describe alternatives you've considered

An alternative could be to issue an error instead of a warning.

@malinkallen malinkallen added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 15, 2021
@jreback
Copy link
Contributor

jreback commented Mar 15, 2021

pls search the issue tracker this is described in several issues

@malinkallen
Copy link
Author

I don't find the other issues discussing this. The only one I find when searching is #38453, which discusses an inconsistency between the engines, but not whether to include the extra column or not. Would you mind pointing the other issues out?

@jreback
Copy link
Contributor

jreback commented Mar 16, 2021

see #38587 and associated issue #21768

closing this as a duplicate

@jreback jreback closed this as completed Mar 16, 2021
@jreback jreback added the IO CSV read_csv, to_csv label Mar 16, 2021
@jreback jreback added this to the No action milestone Mar 16, 2021
@jreback jreback removed Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 16, 2021
@lithomas1 lithomas1 added the Duplicate Report Duplicate issue or pull request label Mar 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request IO CSV read_csv, to_csv
Projects
None yet
Development

No branches or pull requests

3 participants