Skip to content

PERF: Remove slower short circut checks in RangeIndex__getitem__ #57856

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 16, 2024

Conversation

mroeschke
Copy link
Member

Although the all() and not any() checks were supposed to be for short-circuiting, I think these cases are not common enough to always check in this if branch. The not any() check will get reduced by np.flatnonzero. The all() check cannot short circuit unlike the check in _shallow_copy.

A probably more "common case":

In [1]: from pandas import *; import numpy as np
+ /opt/miniconda3/envs/pandas-dev/bin/ninja
[1/1] Generating write_version_file with a custom command

In [2]: N = 10**6

In [3]: s = Series(date_range("2000-01-01", freq="s", periods=N))

In [4]: s[np.random.randint(1, N, 100)] = NaT

In [6]: %timeit s.dropna()
7.49 ms ± 333 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # main

In [5]: %timeit s.dropna()
7.22 ms ± 239 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # PR

The worst case where if key.all() would have gotten hit (will be improved by #57812)

In [1]: import pandas as pd, numpy as np, pyarrow as pa
+ /opt/miniconda3/envs/pandas-dev/bin/ninja
[1/1] Generating write_version_file with a custom command

In [2]: ri = pd.RangeIndex(range(100_000))

In [3]: mask = [True] * 100_000

In [4]: %timeit ri[mask]
2.82 ms ± 31.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # PR

In [4]: %timeit ri[mask]
2.41 ms ± 23.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # main

@mroeschke mroeschke added Performance Memory or execution speed performance Index Related to the Index class or subclasses labels Mar 15, 2024
@mroeschke mroeschke added this to the 3.0 milestone Mar 15, 2024
@phofl phofl merged commit 3d6496d into pandas-dev:main Mar 16, 2024
51 checks passed
@phofl
Copy link
Member

phofl commented Mar 16, 2024

thx @mroeschke

@mroeschke mroeschke deleted the perf/ri/getitem_bool branch March 16, 2024 22:25
pmhatre1 pushed a commit to pmhatre1/pandas-pmhatre1 that referenced this pull request May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Index Related to the Index class or subclasses Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Potential regression induced by PR #57588
2 participants