Skip to content

CLN/PERF: move RangeIndex._cached_data to RangeIndex._cache #35432

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

topper-123
Copy link
Contributor

@topper-123 topper-123 commented Jul 28, 2020

The ._cached_data attribute is not necessary. It was originally added to allow a check to see if the ._data ndarray had been created, but that's also possible to do by a "_data" in _cache check in the new implemention, which IMO would be more idiomatic.

The new implementation has the benefit that the _data will be available to new copies of a RangeIndex, saving the need to create a new ndarray for each new copy of the RangeIndex.

>>> idx = pd.RangeIndex(1_000_000)
>>> idx[[1, 4]]  # this accesses ._data and saves it in cached_data (master) or _cache["_data"](this PR)
>> %timeit idx._shallow_copy()[[1, 4]]
2.55 ms ± 69.3 µs per loop  # master
17.7 µs ± 405 ns per loop  # this PR

xref #26565.

@jreback jreback added Constructors Series/DataFrame/Index/pd.array Constructors Index Related to the Index class or subclasses Performance Memory or execution speed performance labels Aug 3, 2020
@jreback jreback added this to the 1.2 milestone Aug 3, 2020
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. @jbrockmendel if you can give a glance

@jreback jreback requested a review from jbrockmendel August 3, 2020 23:38
@jreback
Copy link
Contributor

jreback commented Aug 4, 2020

@topper-123 can you rebase

def _data(self):
"""
An int array that for performance reasons is created only when needed.

The constructed array is saved in ``_cached_data``. This allows us to
check if the array has been created without accessing ``_data`` and
triggering the construction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we ever do this check outside of tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, and the checks in the tests are to ensure this array isn't created unless it's needed.

@jbrockmendel
Copy link
Member

LGTM pending rebase+green

@topper-123 topper-123 force-pushed the remove_RangeIndex._cached_data branch from 7bea094 to f24b28f Compare August 4, 2020 06:06
@topper-123 topper-123 merged commit d9d34ae into pandas-dev:master Aug 4, 2020
@topper-123 topper-123 deleted the remove_RangeIndex._cached_data branch August 4, 2020 07:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Constructors Series/DataFrame/Index/pd.array Constructors Index Related to the Index class or subclasses Performance Memory or execution speed performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants