-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: Faster CategoricalIndex from categorical #17513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PERF: Faster CategoricalIndex from categorical #17513
Conversation
Codecov Report
@@ Coverage Diff @@
## master #17513 +/- ##
==========================================
- Coverage 91.22% 91.2% -0.02%
==========================================
Files 163 163
Lines 49586 49588 +2
==========================================
- Hits 45233 45227 -6
- Misses 4353 4361 +8
Continue to review full report at Codecov.
|
pandas/core/indexes/category.py
Outdated
@@ -130,6 +130,9 @@ def _create_categorical(self, data, categories=None, ordered=None): | |||
------- | |||
Categorical | |||
""" | |||
if isinstance(data, ABCSeries) and is_categorical_dtype(data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The isinstance(data, ABCSeries)
is maybe a bit too restrictive. This rules out, e.g. CategoricalIndex. But at moment the CategoricalIndex constructor takes a different (faster) path, so it wouldn't benefit from this change anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, does it still pass if you add ABCCategoricalIndex
here as well? it is more consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about isinstance(data, (ABCSeries, type(self)))
. That'll catch Series(dtype='category')
, and CategoricalIndex
, but not Categorical
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about (is_categorical_dtype and not ABCCategorical)?
maybe add a helper method
_ensure_categorical
?
which passes thru Categorical
does .values on CI and Series
and raises else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add a helper method
I'd prefer to wait till we need that. I haven't come across any other cases where I need to treat CategoricalIndex and Series(dtype=category) differently than Categorical.
I am able to replicate this result (159 ms -> 9.88 µs). |
d77b095
to
7cad4f8
Compare
lgtm. merge on green. |
Unrelated test failure https://travis-ci.org/pandas-dev/pandas/jobs/275540546#L1608 (already an issue at #17510) |
Master:
HEAD
Is that a 7000x speedup, or did I mess up the math?