Skip to content

PERF: RangeIndex.get_loc #30930

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 15, 2020
Merged

PERF: RangeIndex.get_loc #30930

merged 6 commits into from
Jan 15, 2020

Conversation

jbrockmendel
Copy link
Member

@jbrockmendel jbrockmendel commented Jan 11, 2020

Small simplification gives a small speedup

In [2]: rng = pd.Index(range(10**5)) 

In [3]: %timeit rng.get_loc(5.0)                                                
6.38 µs ± 381 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)  <-- master
1.15 µs ± 40.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  <-- PR

In [5]: %timeit rng.get_loc(5)                                                  
845 ns ± 5.71 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  <--master
In [5]: %timeit rng.get_loc(5)                                                                                                                                                                            
860 ns ± 16.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  <-- PR

In [9]: def foo(): 
   ...:     try: 
   ...:         return rng.get_loc(None) 
   ...:     except KeyError: 
   ...:         pass 
   ...:                                                                         

In [10]: %timeit foo()                                                          
6.72 µs ± 656 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)  <-- master
1.11 µs ± 94 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  <-- PR

@jbrockmendel jbrockmendel added the Indexing Related to indexing on series/frames, not to indexes themselves label Jan 11, 2020
except ValueError:
raise KeyError(key)
if method is None and tolerance is None:
if is_integer(key) or (is_float(key) and key.is_integer()):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the extra condition here imply any behavioral changes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, intlike floats go through here now instead of going through the _maybe_cast_indexer path in Index.get_loc.

@jorisvandenbossche
Copy link
Member

How does this change affect __contains__ ?

@jbrockmendel
Copy link
Member Author

How does this change affect contains ?

Typo, will update title

@jbrockmendel jbrockmendel changed the title PERF: RangeIndex.__contains__ PERF: RangeIndex.get_loc Jan 13, 2020
@jorisvandenbossche
Copy link
Member

The timings you showed were then also not relevant?

@jbrockmendel
Copy link
Member Author

The timings you showed were then also not relevant?

Huh, not sure how that happened. Will update with get_loc timings, which make a much bigger difference:

In [2]: rng = pd.Index(range(10**5)) 

In [3]: %timeit rng.get_loc(5.0)                                                
6.38 µs ± 381 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)  <-- master
1.15 µs ± 40.9 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  <-- PR

In [5]: %timeit rng.get_loc(5)                                                  
845 ns ± 5.71 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  <--master
In [5]: %timeit rng.get_loc(5)                                                                                                                                                                            
860 ns ± 16.7 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  <-- PR

In [9]: def foo(): 
   ...:     try: 
   ...:         return rng.get_loc(None) 
   ...:     except KeyError: 
   ...:         pass 
   ...:                                                                         

In [10]: %timeit foo()                                                          
6.72 µs ± 656 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)  <-- master
1.11 µs ± 94 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)  <-- PR

@jorisvandenbossche
Copy link
Member

That looks better :)

@jorisvandenbossche jorisvandenbossche added this to the 1.1 milestone Jan 13, 2020
@jreback jreback merged commit 030a35c into pandas-dev:master Jan 15, 2020
@jreback
Copy link
Contributor

jreback commented Jan 15, 2020

thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the collect5 branch January 15, 2020 16:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants