Skip to content

read.csv index_col argument accepts string, list-of-string #22276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
smcinerney opened this issue Aug 10, 2018 · 3 comments · Fixed by #25502
Closed

read.csv index_col argument accepts string, list-of-string #22276

smcinerney opened this issue Aug 10, 2018 · 3 comments · Fixed by #25502
Labels
Docs IO CSV read_csv, to_csv
Milestone

Comments

@smcinerney
Copy link

smcinerney commented Aug 10, 2018

  1. read.csv index_col argument has been accepting either string or list-of-string for years, but the doc (as of 0.24 dev) has never been updated to reflect this. Current and suggested text at bottom.
  2. All columns used in index_col get dropped as regular columns. The doc never explicitly says this and it causes user confusion.
  3. The doc does however discuss "malformed file with delimiters at the end of each line... you might consider index_col=False", this is overly prominent for a rare defective case and should be shunted somewhere less prominent, or at minimum relegated to a parenthesized footnote.

Current read.csv doc:

index_col : int or sequence or False, default None
Column to use as the row labels of the DataFrame. If a sequence is given, a MultiIndex is used. If you have a malformed file with delimiters at the end of each line, you might consider index_col=False to force pandas to not use the first column as the index (row names).

Suggested read.csv doc:

index_col : int/string or sequence of int/string or False, default None
Column(s) to use as the row labels of the DataFrame, either given as string name or column index.
If a sequence of int/string is given, a MultiIndex is used.
Columns used for the index (row names) are dropped from the actual columns of the input dataframe. (They are accessible via .index).
(Note: index_col=False can be used to force pandas to not use the first column as the index, e.g. when you have a malformed file with delimiters at the end of each line).

@WillAyd
Copy link
Member

WillAyd commented Aug 11, 2018

PRs to improve documentation are always welcome - if you have something you feel is better feel free to submit and the team will provide feedback!

@WillAyd WillAyd added the Docs label Aug 11, 2018
@NikhilKumarM
Copy link
Contributor

I will take this up.

@smcinerney
Copy link
Author

smcinerney commented Aug 14, 2018

Can anyone pinpoint the version this was introduced? (at least as old as 0.20.4) and whether it was intentional or accidental?

(I couldn't figure it out from reading through the history and blame on https://github.com/pandas-dev/pandas/blob/master/pandas/io/parsers.py)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs IO CSV read_csv, to_csv
Projects
None yet
5 participants