-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: faster categorical ops for equal or larger than scalar #29820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: faster categorical ops for equal or larger than scalar #29820
Conversation
doc/source/whatsnew/v1.0.0.rst
Outdated
- Performance improvement when comparing a :meth:`Categorical` with a scalar and the scalar is not found in the categories (:issue:`29750`) | ||
- Performance improvement when comparing a :class:`Categorical` with a scalar and the scalar is not found in the categories (:issue:`29750`) | ||
- Performance improvement when checking if values in a :class:`Categorical` are equal, equal or larger or larger than a given scalar. | ||
The improvement is not present if checking if the :class:`Categorical` is less than or less than or equal than the scalar (:issue:`xxxxx`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we have an issue number here? Otherwise, we should use the PR number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, added.
if opname not in {"eq", "__eq__", "ge", "__ge__", "gt", "__gt__"}: | ||
# check for NaN needed if we are not equal or larger | ||
mask = self._codes == -1 | ||
ret[mask] = False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a performance test for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, no ASV's for this ATM. I actually can't get ASV to run locally, maybe a Windows issue?
Anyway, I've added I've a ASV test set, but haven't been able to run it myself, unforfunately. Isn't there a web page, where we post ASVs?
pandas/core/arrays/categorical.py
Outdated
# check for NaN in self | ||
mask = self._codes == -1 | ||
ret[mask] = False | ||
if opname not in {"eq", "__eq__", "ge", "__ge__", "gt", "__gt__"}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is very strange that you are checking for eq and eq it’s just eq
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, changed.
def time_union(self): | ||
union_categoricals([self.a, self.b]) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just moved this down so the constructor checks are first, which seems logical.
thanks @topper-123 |
np.nan is always encoded to -1 in a Categorical, so encoded smaller than all other possible values.
So, if we're checking if a Categorical is equal or larger than a scalar, it is not necessary to find locations of -1, as those will always be False already.
Performance