read_csv should have better failure mode with spurious 'index' column #226

takluyver · 2011-10-12T14:53:39Z

If I use read_csv to load a datafile with the default index_col, but the first column does not have unique values, it succeeds at first, but when I try to select rows later, I get Exception: Index cannot contain duplicate values!. This raises a couple of points:

Should the default be to use an index column? My data often doesn't have one. Apparently R takes the first column as an index if it doesn't have a header, and otherwise does an integer index.
Non-unique values should be detected when we're creating the index, and either leave that column as a regular column (falling back to an integer index), or raise an exception at that point.

The text was updated successfully, but these errors were encountered:

…226

…unctions, further address concerns raised in GH #226

wesm · 2011-10-13T03:27:27Z

As stated in the above commits I think I addressed both of these issues (both of which I've been wanting to do something about)

index_col defaults to None now. If the number of headers is one less than the number of data columns, it will assume that the first column is the index
Added an explicit check for duplicates. I think this is a good idea-- it doesn't do the check at time of index creation because it inflates the cost of GroupBy operations and stuff like that

takluyver · 2011-10-13T08:40:25Z

Great, thanks.

wesm added a commit that referenced this issue Oct 12, 2011

ENH: parser API changes, added parse_dates options, address GH #225, #…

0cc5616

…226

wesm added a commit that referenced this issue Oct 12, 2011

ENH: add explicit duplicate check when creating an index in parsing f…

5ca6ff5

…unctions, further address concerns raised in GH #226

wesm closed this as completed Oct 13, 2011

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read_csv should have better failure mode with spurious 'index' column #226

read_csv should have better failure mode with spurious 'index' column #226

takluyver commented Oct 12, 2011

wesm commented Oct 13, 2011

takluyver commented Oct 13, 2011

read_csv should have better failure mode with spurious 'index' column #226

read_csv should have better failure mode with spurious 'index' column #226

Comments

takluyver commented Oct 12, 2011

wesm commented Oct 13, 2011

takluyver commented Oct 13, 2011