-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: loc behaves incorrectly on PeriodIndex within MultiIndex with 3 levels #24091
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Following the data initialized in the original post, I've expanded some further findings: good_key = mi_key = (pd.Period("2018-11-01", freq="B"), "1")
bad_key = (pd.Period("2018-11-01", freq="B"), "1", "1")
lvls = mi_series.index.levels[0]
good_key in lvls
# > False
bad_key in lvls
# > True In both these cases, due to it being a period index, it hits this dunder method: pandas/pandas/core/indexes/period.py Lines 404 to 415 in b7ee829
Then it hits get_loc. Here we see (similar to the original post) that the 'bad key', which has three levels, does exist in the lvls, when it shouldn't. Looking into it a little deeper the reasoning seems to be due to this block (specifically line 681): pandas/pandas/core/indexes/period.py Lines 680 to 690 in b7ee829
For a key that has length != 3, this fails. For example, with good_key in my example above, it raises a value error in the try block, and as a result doesn't run the following line and assign a value to key.
Whereas for bad_key, it succeeds 'unexpectedly'. print(asdt)
> Period('2018-11-01', 'B')
print(parsed
> '1'
print(reso)
> '1' As a result, the key is now set to only Period('2018-11-01', 'B'). In fact, it seems in this case this try block should never be reached, and the fact that it fails on all levels that are not of length three is the correct outcome, but not deliberately expected. I'd love to hear from someone more familiar with the code base, but it seems the correct first step would be to add a test for this here
And next would be to (potentially) add a conditional on line 680 of periods.py that ensures we aren't passing the wrong type of tuple to parse_time_string. If this seems sensible I can work on it and submit a PR. |
@SimonAlecks thanks for your analysis! I don't have much time right now, but tomorrow I will give a more detailed look and report back. |
I think your diagnosis is correct; concerning the cure, I think that pandas/pandas/_libs/tslibs/parsing.pyx Line 116 in 040f06f
should raise a |
I think your diagnosis is correct; concerning the cure, I think that pandas/pandas/_libs/tslibs/parsing.pyx Lines 113 to 116 in 040f06f
should raise a |
Makes perfect sense, thanks @toobaz I'll work on it and submit a PR. |
This looks to work on master now. Could use a test
|
take |
Code Sample
Problem description
For a Series with a MultiIndex that 1) has exactly 3 levels, and 2) the first level is a PeriodIndex (see examples above), indexing by a triple (
series.loc[(first, second, third)]
) behaves the same asseries.loc[first]
, i.e. the return value is another series with a 2-level MultiIndex, whereas I expect a scalar value from the series.If the first level is not a PeriodIndex, or the MultiIndex has any number of levels other than 3, the problem does not occur.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 2.7.13.final.0
python-bits: 64
OS: Linux
OS-release: 3.10.0-514.10.2.el7.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8
pandas: 0.23.4
pytest: 3.1.0
pip: 8.1.1
setuptools: 28.8.0
Cython: None
numpy: 1.15.4
scipy: None
pyarrow: None
xarray: None
IPython: 5.3.0
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: