Skip to content

ENH: do not write noninformative indices (like RangeIndex) by default in to_csv #56129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
1 of 3 tasks
bingbong-sempai opened this issue Nov 23, 2023 · 4 comments
Open
1 of 3 tasks
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@bingbong-sempai
Copy link

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I'm almost always setting index to False when using to_csv, which is probably true for most people.

Although setting the default to False is impossible for compatibility, would it be possible to make it so that noninformative indices (like RangeIndex) are ignored by default?

I don't imagine most people want a range as their first column in the output.

Feature Description

The default value for index in to_csv could be set to ignore_range which triggers this behaviour.

Alternative Solutions

Leave as is.

Additional Context

Similar issues #34576 and #46583

@bingbong-sempai bingbong-sempai added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 23, 2023
@jbrockmendel
Copy link
Member

I understand where you’re coming from, but think having default behavior that varies depending on index subclass will cause confusion

@bingbong-sempai
Copy link
Author

That's true, but RangeIndex in particular is usually just a placeholder until something more useful is set.

@IDoCodingStuffs
Copy link

Default indices (with col name "Unnamed") are not only redundant and non-informative but also tend to cause bugs. It is just very questionable behavior in principle -- why is some random column with a weird name appearing in my saved file when I am not asking for it? to_csv followed by a load_csv should be an identity operation by default, why is it not?

For example, if there is a to_csv save that is consumed by some downstream function expecting a specific set of columns (such as ingesting into some SQL table), and someone forgets to add the index=False, the whole thing breaks. IMO that is a far more concerning behavior than ambiguities around auto-inferring the index as the order of rows on load when there is no index specified.

@bingbong-sempai
Copy link
Author

Yup, I agree that the behavior is questionable.
The default behavior (without additional parameters) should produce the expected output.
Most people do not expect a new column "Unnamed: 0" in their csv files.
But they also expect indices to show up in output files if the indices contain information (ex. not RangeIndex).
Which is why I'm proposing a special exception for RangeIndex to be excluded from csv files by default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

3 participants