Skip to content

read_csv should have better failure mode with spurious 'index' column #226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
takluyver opened this issue Oct 12, 2011 · 2 comments
Closed
Milestone

Comments

@takluyver
Copy link
Contributor

If I use read_csv to load a datafile with the default index_col, but the first column does not have unique values, it succeeds at first, but when I try to select rows later, I get Exception: Index cannot contain duplicate values!. This raises a couple of points:

  1. Should the default be to use an index column? My data often doesn't have one. Apparently R takes the first column as an index if it doesn't have a header, and otherwise does an integer index.
  2. Non-unique values should be detected when we're creating the index, and either leave that column as a regular column (falling back to an integer index), or raise an exception at that point.
wesm added a commit that referenced this issue Oct 12, 2011
…unctions,

further address concerns raised in GH #226
@wesm
Copy link
Member

wesm commented Oct 13, 2011

As stated in the above commits I think I addressed both of these issues (both of which I've been wanting to do something about)

  1. index_col defaults to None now. If the number of headers is one less than the number of data columns, it will assume that the first column is the index
  2. Added an explicit check for duplicates. I think this is a good idea-- it doesn't do the check at time of index creation because it inflates the cost of GroupBy operations and stuff like that

@wesm wesm closed this as completed Oct 13, 2011
@takluyver
Copy link
Contributor Author

Great, thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants