-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: custom ops for RangeIndex.[all|any|__contains__] #26617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: custom ops for RangeIndex.[all|any|__contains__] #26617
Conversation
pandas/core/indexes/range.py
Outdated
def __contains__(self, key): | ||
hash(key) | ||
try: | ||
key = com.ensure_python_int(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A Python int is needed, else Python iterates over the range, making the operation very slow.
8bca43c
to
cfd68e5
Compare
Codecov Report
@@ Coverage Diff @@
## master #26617 +/- ##
==========================================
- Coverage 91.85% 91.85% -0.01%
==========================================
Files 174 174
Lines 50722 50741 +19
==========================================
+ Hits 46593 46606 +13
- Misses 4129 4135 +6
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #26617 +/- ##
==========================================
- Coverage 91.87% 90.41% -1.47%
==========================================
Files 174 174
Lines 50661 50668 +7
==========================================
- Hits 46547 45809 -738
- Misses 4114 4859 +745
Continue to review full report at Codecov.
|
pandas/core/common.py
Outdated
@@ -490,3 +490,13 @@ def f(x): | |||
f = mapper | |||
|
|||
return f | |||
|
|||
|
|||
def ensure_python_int(value: Union[int, Any]) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put in dtypes.cast but i don’t think u need this as we already have ensure_platform_int
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ensure_platform_int
returns a numpy int(32|64), so won't work here:
>>> rng = range(1, 1_000_000)
>>> %timeit 999_999 in rng
133 ns ± 2.46 ns per loop
>>> %timeit np.int64(999_999) in rng
306 ms ± 66.3 ms per loop
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok in any event move it to collocate with the ensure _ routines
4826efc
to
3ca546e
Compare
pandas/core/dtypes/cast.py
Outdated
@@ -1333,3 +1334,13 @@ def maybe_cast_to_integer_array(arr, dtype, copy=False): | |||
if is_integer_dtype(dtype) and (is_float_dtype(arr) or | |||
is_object_dtype(arr)): | |||
raise ValueError("Trying to coerce float values to integers") | |||
|
|||
|
|||
def ensure_python_int(value: Union[int, Any]) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave the wrong direction, can you put in core/dtypes/common (right below the other ensure_*)
and can you add a doc-string (I know its trivial)
the typing seems odd here, shouldn't this be Union[int, np.integer] ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure. In principle a float or a np.floating and possibly others could for some subvalues pass the test (e.g. if value is 1.0). But ok, passing a float would be a code smell, so just as well make the programmer see passing those as typing errors.
pandas/core/indexes/range.py
Outdated
def __contains__(self, key): | ||
hash(key) | ||
try: | ||
key = cast.ensure_python_int(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import directly
pandas/core/indexes/range.py
Outdated
def all(self): | ||
return 0 not in self._range | ||
|
||
def any(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will need to wait for this until #26581 is merged, as mypy doesn't understand/see attributes defined in __new__
or _simple_new
. If that one is merged, I can go through this again wrt. typing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comments
34706ce
to
fa4c649
Compare
Comments addressed. The typing issue must wait until #26581 is merged for mypy to see _range. |
3d52e46
to
949d452
Compare
can you add the issue number onto the one you already have in Performance section; merge master as well. ping on green. |
949d452
to
71ab9ed
Compare
71ab9ed
to
4aea46d
Compare
git diff upstream/master -u -- "*.py" | flake8 --diff
Follow-up to #26565. Make
RangeIndex._data
be created in fewer cases.Performance examples (but the larger gain is from memory savings by not creating the
_data
array):