-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
When parsing a CSV file without an index, if the list with columns names is too short or too long, one gets a "Index contains duplicate entries" exception. #835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Under the maxim that explicit is better than implicit, to suppress index inference I added the default "infer_index=True"; by setting to false, the right error will occur. |
Hmm, maybe just should raise a better error like "Tried to use columns 1-X as index but had duplicates" |
I'm not sure, I'm thinking that if you set index_col=None, you definitely
|
@wesm, fine by me, easy change. |
@aristotle137 the trouble is that there are cases where you have N - 1 values in the first row and N values in the subsequent rows (with indices) and you want it to "just work". I would rather support this fairly common use case rather than make those people's lives difficult in order to raise a better error message. |
* commit 'v0.7.1-1-ga2e86c2': (90 commits) BUG: Fix Series, DataFrame plot() for non numerical/datetime (Multi)Index (closes pandas-dev#741). RLS: Version 0.7.1 DOC: release notes, what's new, change dev version to 0.7.1 BUG: close pandas-dev#839, another case where nan may be assigned to int series ENH: raise NotImplementedError if user tries to iterate over .ix, GH pandas-dev#840 BUG: fixed null-check per pandas-dev#839 BUG: close pandas-dev#839, exception on assigning NA to bool or int64 series TST: more test coverage for release target TST: added core coverage TST: fix lingering line of code from pandas-dev#838 DOC: added yet a bit more to release notes TST: unit test for pandas-dev#838 DOC: added more release notes BUG: raise more helpful error msg for pandas-dev#835 TST: added skip excel test for no xlrd installed BUG: close pandas-dev#835, add option to suppress index inference BUG: close pandas-dev#837, excelfile throws an exception for two-line file ENH: fill_value arg in DataFrame.reindex/reindex_axis, add fillna to sparse objects, GH pandas-dev#784 ENH: add fill_value argument to Series.reindex, DataFrame next, pandas-dev#784 ENH: concat Series with axis=1 for completeness, GH pandas-dev#787 ...
My code worked when i used 'sep' parameter. As following:
|
Hi Wes,
Just a minor bug submisson:
When parsing a CSV file without an index, if the list with columns names is too short or too long, one gets a "Index contains duplicate entries" exception.
E.g.:
The file test.csv contains:
abc,1,a
abc,2,b
def,3,c
Running:
table = pandas.read_csv('reports/9/test.csv', header=None, names=['a', 'b'],index_col=None)
Results in:
Many thanks,
Marius
The text was updated successfully, but these errors were encountered: