Skip to content

BUG: View on DataFrame with a Multiindex using .loc doesn't give proper results for df.index.levels or df.index.levshape #40943

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
scmartin opened this issue Apr 14, 2021 · 5 comments
Labels
Closing Candidate May be closeable, needs more eyeballs MultiIndex Usage Question

Comments

@scmartin
Copy link

If I create a view on a subset of a dataframe with a Multiindex using .loc, if I print the index of the subset, it get the expected subset of the Multiindex. However index.levels and index.levshape return the values for the original dataframe, not the subset.

import pandas as pd
import numpy as np

index = [[1,1,1,1,2,2,2,2,3,3,3,3],
             [1,2,3,4,1,2,3,4,1,2,3,4]]
data = pd.DataFrame(np.random.randn(12,2),index=index)
smalldata = data.loc[(slice(None),slice(1,2)),:]
print(data.index)
print(smalldata.index)
print(smalldata.index.levels)
print(smalldata.index.levshape)

Problem description

If the index of the created view is only the subset of values which appear in the view, the index methods levels and levshape should reflect this subset of values, not the whole index of the original dataframe

Expected Output

>>> print(smalldata.index.levels)
[[1, 2, 3], [1, 2]]

Output of pd.show_versions()

INSTALLED VERSIONS

commit : d9fff27
python : 3.8.5.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-957.1.3.el7.x86_64
Version : #1 SMP Thu Nov 29 14:49:43 UTC 2018
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : None.None

pandas : 1.1.0
numpy : 1.19.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2
setuptools : 49.2.0.post20200712
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.2
IPython : 7.17.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.0
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
numba : None

@scmartin scmartin added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 14, 2021
@phofl
Copy link
Member

phofl commented Apr 14, 2021

Hey, thanks for your report.

This has nothing to do with the view. MultiIndex levels are not updated, when values are deleted from the Index, see for example:

x = data.drop(index=[(1, 4), (2, 4), (3, 4)])

still has

[[1, 2, 3], [1, 2, 3, 4]]

There was discussion about this in the past

@phofl
Copy link
Member

phofl commented Apr 14, 2021

See #36227 for example

@phofl phofl added Closing Candidate May be closeable, needs more eyeballs MultiIndex Usage Question and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 14, 2021
@phofl
Copy link
Member

phofl commented Apr 20, 2021

Closing as this is expected

@phofl phofl closed this as completed Apr 20, 2021
@phofl phofl added this to the No action milestone Apr 20, 2021
@rambo-yuanbo
Copy link

but still, maybe this should be mentioned in MultiIndex.levshape Documentation

@phofl
Copy link
Member

phofl commented Sep 1, 2022

This is mentioned in multiple laces in the documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs MultiIndex Usage Question
Projects
None yet
Development

No branches or pull requests

3 participants