ENH: do not write noninformative indices (like RangeIndex) by default in to_csv #56129

bingbong-sempai · 2023-11-23T04:28:36Z

Feature Type

Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas

Problem Description

I'm almost always setting index to False when using to_csv, which is probably true for most people.

Although setting the default to False is impossible for compatibility, would it be possible to make it so that noninformative indices (like RangeIndex) are ignored by default?

I don't imagine most people want a range as their first column in the output.

Feature Description

The default value for index in to_csv could be set to ignore_range which triggers this behaviour.

Alternative Solutions

Leave as is.

Additional Context

Similar issues #34576 and #46583

The text was updated successfully, but these errors were encountered:

jbrockmendel · 2023-11-24T00:56:07Z

I understand where you’re coming from, but think having default behavior that varies depending on index subclass will cause confusion

bingbong-sempai · 2023-11-24T01:36:16Z

That's true, but RangeIndex in particular is usually just a placeholder until something more useful is set.

IDoCodingStuffs · 2024-06-12T17:57:29Z

Default indices (with col name "Unnamed") are not only redundant and non-informative but also tend to cause bugs. It is just very questionable behavior in principle -- why is some random column with a weird name appearing in my saved file when I am not asking for it? to_csv followed by a load_csv should be an identity operation by default, why is it not?

For example, if there is a to_csv save that is consumed by some downstream function expecting a specific set of columns (such as ingesting into some SQL table), and someone forgets to add the index=False, the whole thing breaks. IMO that is a far more concerning behavior than ambiguities around auto-inferring the index as the order of rows on load when there is no index specified.

bingbong-sempai · 2024-06-13T01:01:30Z

Yup, I agree that the behavior is questionable.
The default behavior (without additional parameters) should produce the expected output.
Most people do not expect a new column "Unnamed: 0" in their csv files.
But they also expect indices to show up in output files if the indices contain information (ex. not RangeIndex).
Which is why I'm proposing a special exception for RangeIndex to be excluded from csv files by default.

bingbong-sempai added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: do not write noninformative indices (like RangeIndex) by default in to_csv #56129

ENH: do not write noninformative indices (like RangeIndex) by default in to_csv #56129

bingbong-sempai commented Nov 23, 2023

jbrockmendel commented Nov 24, 2023

bingbong-sempai commented Nov 24, 2023

IDoCodingStuffs commented Jun 12, 2024

bingbong-sempai commented Jun 13, 2024

ENH: do not write noninformative indices (like RangeIndex) by default in to_csv #56129

ENH: do not write noninformative indices (like RangeIndex) by default in to_csv #56129

Comments

bingbong-sempai commented Nov 23, 2023

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

jbrockmendel commented Nov 24, 2023

bingbong-sempai commented Nov 24, 2023

IDoCodingStuffs commented Jun 12, 2024

bingbong-sempai commented Jun 13, 2024