You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If I use read_csv to load a datafile with the default index_col, but the first column does not have unique values, it succeeds at first, but when I try to select rows later, I get Exception: Index cannot contain duplicate values!. This raises a couple of points:
Should the default be to use an index column? My data often doesn't have one. Apparently R takes the first column as an index if it doesn't have a header, and otherwise does an integer index.
Non-unique values should be detected when we're creating the index, and either leave that column as a regular column (falling back to an integer index), or raise an exception at that point.
The text was updated successfully, but these errors were encountered:
As stated in the above commits I think I addressed both of these issues (both of which I've been wanting to do something about)
index_col defaults to None now. If the number of headers is one less than the number of data columns, it will assume that the first column is the index
Added an explicit check for duplicates. I think this is a good idea-- it doesn't do the check at time of index creation because it inflates the cost of GroupBy operations and stuff like that
If I use
read_csv
to load a datafile with the default index_col, but the first column does not have unique values, it succeeds at first, but when I try to select rows later, I getException: Index cannot contain duplicate values!
. This raises a couple of points:The text was updated successfully, but these errors were encountered: