-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Problems when accessing MultiIndex level with duplicated level names #18872
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Milestone
Comments
Sorry, had missed #10461. Maybe it is a duplicate, depending on how we want to proceed. |
we should disallow duplicate level names |
OK. Considering however that the default level names are (duplicate)
|
I think 1) disallow non-None duplicate level names. |
toobaz
added a commit
to toobaz/pandas
that referenced
this issue
Dec 20, 2017
toobaz
added a commit
to toobaz/pandas
that referenced
this issue
Dec 21, 2017
toobaz
added a commit
to toobaz/pandas
that referenced
this issue
Dec 22, 2017
toobaz
added a commit
to toobaz/pandas
that referenced
this issue
Dec 22, 2017
toobaz
added a commit
to toobaz/pandas
that referenced
this issue
Dec 22, 2017
toobaz
added a commit
to toobaz/pandas
that referenced
this issue
Dec 23, 2017
toobaz
added a commit
to toobaz/pandas
that referenced
this issue
Dec 23, 2017
toobaz
added a commit
to toobaz/pandas
that referenced
this issue
Dec 29, 2017
jreback
pushed a commit
that referenced
this issue
Dec 29, 2017
hexgnu
pushed a commit
to hexgnu/pandas
that referenced
this issue
Jan 1, 2018
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Code Sample, a copy-pastable example if possible
Problem description
There are different problems, but the root cause is (I think) the same:
KeyError
inIn [5]:
should not appearOut[3]
andOut[4]
)In [6]
, I have no way whatsoever to access the second column, since it is denoted by a duplicated name and its index is also the name of another column (sure, this is a perverse example, but I suspect it can bite in some cases in which users use duplicate labels and pandas internal code adds integer labels)Expected Output
If we were to design this from scratch, the solution would be simple: prioritize the "index" interpretation of an integer over the "label" interpretation, so that the former is always unambiguous. Is this a too strong API change?
If the answer is "no", I will be happy to implement it, possibly with a temporary warning in those cases where the behaviour will change (that is: requested label is integer and is present in the names).
If the answer is "yes", I would like to at least suppress the
KeyError
inIn [5]:
and haveIn [4]:
raise an error rather than return a result inconsistent withIn [3]:
.Output of
pd.show_versions()
INSTALLED VERSIONS
commit: 04db779
python: 3.5.3.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: it_IT.UTF-8
pandas: 0.22.0.dev0+388.g04db779d4
pytest: 3.2.3
pip: 9.0.1
setuptools: 36.7.0
Cython: 0.25.2
numpy: 1.12.1
scipy: 0.19.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.5.6
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: 1.2.0dev
tables: 3.3.0
numexpr: 2.6.1
feather: 0.3.1
matplotlib: 2.0.0
openpyxl: 2.3.0
xlrd: 1.0.0
xlwt: 1.3.0
xlsxwriter: 0.9.6
lxml: 4.1.1
bs4: 4.5.3
html5lib: 0.999999999
sqlalchemy: 1.0.15
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.2.1
The text was updated successfully, but these errors were encountered: