-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Bug in read_csv with duplicated column names #7160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
read_csv needs to interpret many formats so that's why it changes to not have duplicate columns so this needs some work marking as a bug for 0.15 - feel free to submit a pr |
@jreback : This bug still exists in |
yeh I think the problem is that the names->columns are passed back as a dict and not as 2 lists, so it gets lots. Its in the post-processing code in python somewhere. I would expect as the OP suggests. Note tthis is different that if |
FYI: for the second example, that output is correct because |
@jreback : Question, what does the |
|
Oh, okay. How about my second question? EDIT: never mind - |
@jreback : Question, what does |
it's a way to turn duplicates into things like we really don't need this anymore but it's there so leave I guess -main issue is supporting duplicates properly |
That is true...as of right now, there is ZERO support for duplicates AFAICT. 😞 |
Deduplicates the 'names' parameter by default if there are duplicate names. Also raises when 'mangle_ dupe_cols' is False to prevent data overwrite. Closes pandas-devgh-7160. Closes pandas-devgh-9424.
Tested on 0.13.0, 0.13.1 and 0.14.0rc1:
The last one returns:
I would expect all 3 methods to return the same DataFrame. I noticed this when I wanted to read csv file that had a separate file with a header (and a duplicated column in it). BTW is there a better way to do it than to read the header file first and pass the output into 'names' parameter of read_csv?
The text was updated successfully, but these errors were encountered: