read_csv: Warn when too many names are specified #40449

malinkallen · 2021-03-15T17:05:11Z

Is your feature request related to a problem?

When you call read_csv and the length of names is larger than the number of columns in the data, extra column(s) containing only NaN will be added to the DataFrame created. I think that, generally, the extra name(s) is/are a sign of a bug in the program and/or input data that does not look as the user expects. In both cases, I think it could be helpful to issue a warning to make the user aware of the issue. Moreover, a warning may help avoiding future errors, for example if the user filters or merges data on the column containing only NaNs, thinking that it contains actual data.

Describe the solution you'd like

I think it would be helpful if read_csv issued a warning when the number of elements in names is larger than the number of columns in the data.

API breaking implications

If only a warning is issued, I don't see how this could break anything.

Describe alternatives you've considered

An alternative could be to issue an error instead of a warning.

The text was updated successfully, but these errors were encountered:

jreback · 2021-03-15T17:11:39Z

pls search the issue tracker this is described in several issues

malinkallen · 2021-03-16T13:34:33Z

I don't find the other issues discussing this. The only one I find when searching is #38453, which discusses an inconsistency between the engines, but not whether to include the extra column or not. Would you mind pointing the other issues out?

jreback · 2021-03-16T13:45:28Z

see #38587 and associated issue #21768

closing this as a duplicate

malinkallen added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 15, 2021

jreback closed this as completed Mar 16, 2021

jreback added the IO CSV read_csv, to_csv label Mar 16, 2021

jreback added this to the No action milestone Mar 16, 2021

jreback removed Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 16, 2021

lithomas1 added the Duplicate Report Duplicate issue or pull request label Mar 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_csv: Warn when too many names are specified #40449

read_csv: Warn when too many names are specified #40449

malinkallen commented Mar 15, 2021

jreback commented Mar 15, 2021

malinkallen commented Mar 16, 2021

jreback commented Mar 16, 2021

read_csv: Warn when too many names are specified #40449

read_csv: Warn when too many names are specified #40449

Comments

malinkallen commented Mar 15, 2021

Is your feature request related to a problem?

Describe the solution you'd like

API breaking implications

Describe alternatives you've considered

jreback commented Mar 15, 2021

malinkallen commented Mar 16, 2021

jreback commented Mar 16, 2021