Skip to content

PERF: Faster comparisons of indexes when compared to self #37109

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented Oct 13, 2020

Examples:

>>> rng = pd.RangeIndex(100_000)
>>> %timeit rng == rng
429 µs ± 96.9 µs per loop  # master
14.1 µs ± 93.8 ns per loop  # this PR
>>> idx = rng.astype(int)
>>> %timeit idx == idx
347 µs ± 1.42 µs per loop  #master
14.1 µs ± 91.4 ns per loop  # this PR
>>> idx = rng.astype(object)
>>> %timeit idx == idx
5.97 ms ± 12.7 µs per loop  # master
3.72 ms ± 28.4 µs per loop  # this PR

@topper-123 topper-123 changed the title PERF: Faster comparisons of indexes when the same PERF: Faster comparisons of indexes when compared to self Oct 13, 2020
@jbrockmendel
Copy link
Member

that works for int dtype, but what about for dtypes that have NAs?

@topper-123
Copy link
Contributor Author

Hmm, that’s a good point. I’ll see if anything can be done about that.

@jreback jreback added Index Related to the Index class or subclasses Performance Memory or execution speed performance labels Oct 14, 2020
@twoertwein
Copy link
Member

you could return ~isna() for equal and isna() for not equal (if that is still faster)?

@jbrockmendel
Copy link
Member

you could return ~isna() for equal and isna() for not equal (if that is still faster)?

even then you'd need to watch out for pd.NA

I think just check for _can_hold_na.

Also for RangeIndex could compare the range objects instead of (i think) casting to ndarray

@topper-123 topper-123 force-pushed the same_index_comparisons branch from c81337c to ef5ad65 Compare October 14, 2020 18:16
@topper-123 topper-123 force-pushed the same_index_comparisons branch from ef5ad65 to 534afbc Compare October 14, 2020 18:23
@topper-123
Copy link
Contributor Author

I've made an update. I can't find the tests for index comparisons, they clearly need some tests for comparisons with indexes that contain nans. My first proposal obviously shouldn't have passed.

Anyonw know where they're located?

@jbrockmendel
Copy link
Member

Anyonw know where they're located?

tests.arithmetic

if self.is_(other) and not isinstance(self, MultiIndex):
# short-circuit when other is same as self
# if we're comparing equality, return an np.ones array, else an np.zeros arr
bool_arr_type, na_bool = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the pattern we use elsewhere is more like

res = np.empty(self.shape, dtype=bool)
res[:] = (op.__name__ in ["eq", "le", "ge"]
if self._can_hold_na:
   ...

@jbrockmendel
Copy link
Member

would this let us remove NumericIndex._cmp_method?

@jbrockmendel
Copy link
Member

@topper-123 can you merge master

@jreback
Copy link
Contributor

jreback commented Jan 1, 2021

@topper-123 still worth it here? can you merge master

@topper-123
Copy link
Contributor Author

This has already been fixed, e.g. the last example:

>>> idx = rng.astype(object)
>>> %timeit idx == idx
5.97 ms ± 12.7 µs per loop  # when this PR was opened
53.9 µs ± 9.8 µs per loop  # master

So I'll just close this one.

@topper-123 topper-123 closed this Jan 1, 2021
@topper-123 topper-123 deleted the same_index_comparisons branch January 1, 2021 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Index Related to the Index class or subclasses Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants