Skip to content

BUG: Regression in PeriodIndex.loc string slicing behaviour between 1.0.x and >= 1.1.0 #40145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
enisnazif opened this issue Mar 1, 2021 · 4 comments
Open
3 tasks done
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Period Period data type

Comments

@enisnazif
Copy link
Contributor

enisnazif commented Mar 1, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

import pandas as pd
print(pd.__version__)
idx = pd.PeriodIndex(['2020-01-01 11:00', '2020-01-01 12:00'], freq='H')
df = pd.DataFrame([1,2], index=idx)
df.loc['2020-01-01 11:00']

Problem description

It seems there has been a change in the behaviour of string slicing on a PeriodIndex in Pandas 1.1 onwards. Despite the exact value being present in the index, a KeyError is returned.

Output in 0.25.3:

0    1
Name: 2020-01-01 11:00, dtype: int64

Output in 1.0.1

0    1
Name: 2020-01-01 11:00, dtype: int6

Output in versions >= 1.1.0

KeyError: '2020-01-01 11:00'

Interestingly,

df.loc['2020-01-01 11']

still works and gives the expected answer.

Looks to be due to this assertion:

if not grp.value < freqn:

Expected Output

0    1
Name: 2020-01-01 11:00, dtype: int6

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 7d32926
python : 3.7.9.final.0
python-bits : 64
OS : Linux
OS-release : 4.19.128-microsoft-standard
Version : #1 SMP Tue Jun 23 12:58:10 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.2
numpy : 1.19.2
pytz : 2021.1
dateutil : 2.8.1
pip : 21.0.1
setuptools : 52.0.0.post20210125
Cython : 0.29.22
pytest : 6.2.2
hypothesis : None
sphinx : None
blosc : None
feather : 0.4.1
xlsxwriter : None
lxml.etree : 4.6.2
html5lib : 1.1
pymysql : None
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 2.11.3
IPython : 7.20.0
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.4
numexpr : 2.7.2
odfpy : None
openpyxl : 3.0.6
pandas_gbq : None
pyarrow : 3.0.0
pyxlsb : None
s3fs : None
scipy : 1.6.1
sqlalchemy : 1.3.15
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : None

@enisnazif enisnazif added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 1, 2021
@jreback
Copy link
Contributor

jreback commented Mar 1, 2021

pls try on 1.2.2 and on master

@enisnazif
Copy link
Contributor Author

Hi Jeff - have just tried on both 1.2.2 and latest (1.3.0.dev0+893.gec6abbf2b4), and the error still persists

@dicristina
Copy link
Contributor

This still occurs in master. Using datetime or a subclass will work as expected but using a string which corresponds to an equal or higher resolution than the index will take us here:

if reso == self.dtype.resolution:
# the reso < self.dtype.resolution case goes through _get_string_slice
key = Period(parsed, freq=self.freq)
loc = self.get_loc(key, method=method, tolerance=tolerance)
# Recursing instead of falling through matters for the exception
# message in test_get_loc3 (though not clear if that really matters)
return loc
elif method is None:
raise KeyError(key)
else:
key = Period(parsed, freq=self.freq)

The case in which the resolutions are equal is handled by recursing but when the string represents a higher resolution we end up raising KeyError in line 437. My view is that this should work for PeriodIndex but I'm interested in knowing the opinion of others.

cc @jbrockmendel

@mroeschke mroeschke added Period Period data type Indexing Related to indexing on series/frames, not to indexes themselves and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 19, 2021
@jbrockmendel
Copy link
Member

I think this was an intentional decision made for internal consistency in #31172

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Period Period data type
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants