Skip to content

REF/PERF: dont use hashtable in IndexEngine.__contains__ #45192

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 1, 2022

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Jan 4, 2022

  • closes #xxxx
  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them
  • whatsnew entry

This is going to have some tradeoffs:

import pandas as pd
import numpy as np

idx = pd.Index(np.arange(1000))
idx2 = idx.repeat(2)

In [5]: %timeit 400.0 in idx
1.27 µs ± 46.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  # <- master
421 ns ± 3.81 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  # <- PR

In [6]: %timeit idx.get_loc(400)
1.11 µs ± 49.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  # <- master
625 ns ± 49.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)   # <- PR

In [6]: 400 in idx2
1.05 µs ± 24.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  # <- master
1.85 µs ± 43.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  # <- PR

In [7]: %timeit idx2.get_loc(400)
3.43 µs ± 309 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)  # <- master
2.07 µs ± 52.1 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)  # <- PR

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance labels Jan 8, 2022
@jreback jreback added this to the 1.5 milestone Jan 8, 2022
@jreback
Copy link
Contributor

jreback commented Jan 8, 2022

i think this is fine

@jreback
Copy link
Contributor

jreback commented Jan 8, 2022

cc @phofl if comments

@jreback
Copy link
Contributor

jreback commented Jan 31, 2022

can you merge master. cc @phofl if comments.

@jbrockmendel
Copy link
Member Author

rebased + green

@jreback jreback merged commit 6781480 into pandas-dev:main Feb 1, 2022
@jbrockmendel jbrockmendel deleted the perf-hashless branch February 1, 2022 00:39
phofl pushed a commit to phofl/pandas that referenced this pull request Feb 14, 2022
yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants