-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
read_csv clobbers values of columns with duplicate names #9424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I've came across something similar. When using import pandas as pd
from StringIO import StringIO
data = """A,A,B,B,B
1,2,3,4,5
6,7,8,9,10
11,12,13,14,15
"""
df1 = pd.read_table(StringIO(data), sep=',', mangle_dupe_cols=True)
df2 = pd.read_table(StringIO(data), sep=',', mangle_dupe_cols=False) Now
which has the original data but non-duplicate column names; and
which has duplicate column names but their respecrive data has been overriden. Reproducible bug in IPython notebook: http://nbviewer.ipython.org/github/yoavram/ipython-notebooks/blob/master/pandas%20duplicate%20column%20bug.ipynb Pandas version 0.16.0. Python 2.7. |
This seems very strange to me -- I don't think there's any good reason for this behavior. I'm going to label it as a bug. |
xref #7160 |
Deduplicates the 'names' parameter by default if there are duplicate names. Also raises when 'mangle_ dupe_cols' is False to prevent data overwrite. Closes pandas-devgh-7160. Closes pandas-devgh-9424.
xref #10577 (has test for duplicates with empty data)
I don't expect this is the correct behavior, although it's always possible I'm doing something wrong. Importing data using the
names
keyword will clobber the values of columns where the name is duplicated. For example:returns
However, this produces the correct result:
Interestingly, it works if the field names are in the header:
Is this a bug or am I doing something wrong?
The text was updated successfully, but these errors were encountered: