You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As part of #30588, we now raise when trying to create a 2D index. This introduces a behavior change when you call DataFrame.set_index with duplicate data.
Code Sample, a copy-pastable example if possible
In [1]: importpandasaspdIn [2]: df=pd.DataFrame([[1, 2, 3]], columns=['a', 'a', 'b'])
In [3]: result=df.set_index('a')
On pandas 0.25.3, that gives back a DataFrame with a broken Index. Some DataFrame operations will work, but even things like printing the repr will fail
~/sandbox/pandas/pandas/core/indexes/numeric.py in __new__(cls, data, dtype, copy, name)
76if subarr.ndim >1:
77# GH#13601, GH#20285, GH#27125
---> 78 raise ValueError("Index data must be 1-dimensional")
7980 name = maybe_extract_name(name, data, cls)
ValueError: Index data must be 1-dimensional
Problem description
The old output is clearly broken, so I wouldn't consider this a (major) regression. And I don't think people should be doing this in the first place. But I wanted to ask, should DataFrame.set_index(scalar) return a MultiIndex when scalar is a duplicate label?
The text was updated successfully, but these errors were encountered:
should DataFrame.set_index(scalar) return a MultiIndex when scalar is a duplicate label?
I think so.
Also, the behavior of set_index(duplicate_column_name) seems to depend on the types of the columns that share a name. For timedelta + int, we get an Index of tuples, whereas for 2 int columns, we get a ValueError:
importpandasaspddf=pd.DataFrame([[pd.Timedelta(1), 1, 2]], columns=['a', 'a', 'b'])
# this works and sets the index to the single tuple (0 days 00:00:00.000000001, 1)print(df.set_index('a'))
df=pd.DataFrame([[1, 1, 2]], columns=['a', 'a', 'b'])
# this raises `ValueError: Index data must be 1-dimensional`print(df.set_index('a'))
As part of #30588, we now raise when trying to create a 2D index. This introduces a behavior change when you call DataFrame.set_index with duplicate data.
Code Sample, a copy-pastable example if possible
On pandas 0.25.3, that gives back a DataFrame with a broken Index. Some DataFrame operations will work, but even things like printing the repr will fail
With 1.0.0rc0, that raises
Problem description
The old output is clearly broken, so I wouldn't consider this a (major) regression. And I don't think people should be doing this in the first place. But I wanted to ask, should
DataFrame.set_index(scalar)
return a MultiIndex whenscalar
is a duplicate label?The text was updated successfully, but these errors were encountered: