Skip to content

ENH: Different behavior of pandas when saving and restoring from a CSV file #44639

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pevogam opened this issue Nov 27, 2021 · 3 comments
Closed
Labels
API - Consistency Internal Consistency of API/Behavior IO CSV read_csv, to_csv

Comments

@pevogam
Copy link

pevogam commented Nov 27, 2021

Is your feature request related to a problem?

It seems that df.to_csv(filename) will automatically save the index (because of the choice of default) while pd.read_csv(filename) will then automatically add its own index (again because of the choice of default). A user relying on simple default behaviors doing multiple reading and writing to an CSV file will end up accumulating indices:

      Unnamed: 0  Unnamed: 0.1  Unnamed: 0.1.1  Unnamed: 0.1.1.1         p       q    r
0              0             0               0                 0  54.78  0.0005  1.0
1              1             1               1                 1  54.78  0.0005  1.0
2              2             2               2                 2  54.78  0.0005  1.0
3              3             3               3                 3  54.78  0.0005  1.0
4              4             4               4                 4  54.78  0.0005  1.0
...          ...           ...             ...               ...       ...     ...  ...
2360        2360          2360            2360              2360  54.78  0.0005  0.0

and thus can get tripped by this difference in choice of defaults.

Describe the solution you'd like

I would recommend settling on the same default behavior when both storing and retrieving from CSV, possibly simply not storing the index by default when using to_csv to obtain a symmetric result.

API breaking implications

I assume this might break the current API unless we provide a warning for the ongoing change at least for a couple of versions. After all, API is also not meant to be frozen and never improved just because it is being used.

Describe alternatives you've considered

None, this is just a suggestion.

Additional context

Possibly added above.

@pevogam pevogam added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 27, 2021
@phofl
Copy link
Member

phofl commented Nov 27, 2021

Duplicate of #24468

@phofl phofl marked this as a duplicate of #24468 Nov 27, 2021
@phofl phofl closed this as completed Nov 27, 2021
@jreback
Copy link
Contributor

jreback commented Nov 27, 2021

see #4595 and there are some others

long discussed and rejected

@jreback jreback added this to the No action milestone Nov 27, 2021
@jreback jreback added IO CSV read_csv, to_csv API - Consistency Internal Consistency of API/Behavior and removed Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 27, 2021
@pevogam
Copy link
Author

pevogam commented Nov 27, 2021

Yes, I will monitor the issue this one turned to be a duplicate of. Thanks for the hints!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API - Consistency Internal Consistency of API/Behavior IO CSV read_csv, to_csv
Projects
None yet
Development

No branches or pull requests

3 participants