Skip to content

PERF: regression in PeriodIndex getitem / get_loc #42247

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Jun 26, 2021 · 4 comments
Closed

PERF: regression in PeriodIndex getitem / get_loc #42247

jorisvandenbossche opened this issue Jun 26, 2021 · 4 comments
Labels
Performance Memory or execution speed performance Period Period data type Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jun 26, 2021

#41670 seems to have caused slowdowns in several of the Period indexing benchmarks: see https://pandas.pydata.org/speed/pandas/#regressions?sort=1&dir=desc, scrolldown to "2021-05-28 17:58"

One example: https://pandas.pydata.org/speed/pandas/#period.Indexing.time_get_loc?python=3.8&Cython=0.29.21&commits=ddc28a42

Originally posted by @jorisvandenbossche in #41670 (comment)

@jorisvandenbossche jorisvandenbossche added Performance Memory or execution speed performance Period Period data type Regression Functionality that used to work in a prior pandas version labels Jun 26, 2021
@jorisvandenbossche jorisvandenbossche added this to the 1.3 milestone Jun 26, 2021
@jbrockmendel
Copy link
Member

Looks like 90% of this is from the check isinstance(key, Period) and key.freq != self.freq. The rest I expect is in is_valid_na_for_dtype.

The key.freq != self.freq check is the hotspot. We can probably improve that quite a bit bc DateOffset.__eq__ checks some things that aren't relevant in this case. Really we just need to check that freq._period_dtype_code and freq.n match

@jbrockmendel
Copy link
Member

idx = pd.period_range("2016-01-01", periods=10000)
key = idx[500]

%timeit idx.freq == key.freq
15.6 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit idx.freq.n == key.freq.n and idx.freq._period_dtype_code == key.freq._period_dtype_code
893 ns ± 21.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Just a question of where to implement this check

@jreback
Copy link
Contributor

jreback commented Jun 29, 2021

#42293 should fix, need to verify

@simonjayhawkins
Copy link
Member

closing

(pandas-dev) simon@T3630:~/pandas/asv_bench (master)$ asv compare v1.3.0.dev0 master|grep period.Indexing.time_get_loc
      2.22±0.05μs      2.13±0.04μs     0.96  period.Indexing.time_get_loc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Period Period data type Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

No branches or pull requests

4 participants