ENH: Different behavior of pandas when saving and restoring from a CSV file #44639

pevogam · 2021-11-27T12:34:27Z

Is your feature request related to a problem?

It seems that df.to_csv(filename) will automatically save the index (because of the choice of default) while pd.read_csv(filename) will then automatically add its own index (again because of the choice of default). A user relying on simple default behaviors doing multiple reading and writing to an CSV file will end up accumulating indices:

      Unnamed: 0  Unnamed: 0.1  Unnamed: 0.1.1  Unnamed: 0.1.1.1         p       q    r
0              0             0               0                 0  54.78  0.0005  1.0
1              1             1               1                 1  54.78  0.0005  1.0
2              2             2               2                 2  54.78  0.0005  1.0
3              3             3               3                 3  54.78  0.0005  1.0
4              4             4               4                 4  54.78  0.0005  1.0
...          ...           ...             ...               ...       ...     ...  ...
2360        2360          2360            2360              2360  54.78  0.0005  0.0

and thus can get tripped by this difference in choice of defaults.

Describe the solution you'd like

I would recommend settling on the same default behavior when both storing and retrieving from CSV, possibly simply not storing the index by default when using to_csv to obtain a symmetric result.

API breaking implications

I assume this might break the current API unless we provide a warning for the ongoing change at least for a couple of versions. After all, API is also not meant to be frozen and never improved just because it is being used.

Describe alternatives you've considered

None, this is just a suggestion.

Additional context

Possibly added above.

The text was updated successfully, but these errors were encountered:

phofl · 2021-11-27T12:48:36Z

Duplicate of #24468

jreback · 2021-11-27T12:50:44Z

see #4595 and there are some others

long discussed and rejected

pevogam · 2021-11-27T12:51:49Z

Yes, I will monitor the issue this one turned to be a duplicate of. Thanks for the hints!

pevogam added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 27, 2021

phofl marked this as a duplicate of #24468 Nov 27, 2021

phofl closed this as completed Nov 27, 2021

jreback added this to the No action milestone Nov 27, 2021

jreback added IO CSV read_csv, to_csv API - Consistency Internal Consistency of API/Behavior and removed Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Different behavior of pandas when saving and restoring from a CSV file #44639

ENH: Different behavior of pandas when saving and restoring from a CSV file #44639

pevogam commented Nov 27, 2021

phofl commented Nov 27, 2021

jreback commented Nov 27, 2021

pevogam commented Nov 27, 2021

ENH: Different behavior of pandas when saving and restoring from a CSV file #44639

ENH: Different behavior of pandas when saving and restoring from a CSV file #44639

Comments

pevogam commented Nov 27, 2021

Is your feature request related to a problem?

Describe the solution you'd like

API breaking implications

Describe alternatives you've considered

Additional context

phofl commented Nov 27, 2021

jreback commented Nov 27, 2021

pevogam commented Nov 27, 2021