-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
PERF: improved performance of CategoricalIndex.is_monotonic* #21025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b7f6e04
to
a775186
Compare
this hit the same code path; so check this |
a775186
to
3378b3a
Compare
Not sure I follow, but these two versions do not follow the same code path, as the old version required creating a new
|
Codecov Report
@@ Coverage Diff @@
## master #21025 +/- ##
=======================================
Coverage 91.83% 91.83%
=======================================
Files 153 153
Lines 49495 49495
=======================================
Hits 45454 45454
Misses 4041 4041
Continue to review full report at Codecov.
|
can you add additional tests using strings (and not just integers) in that same test. otherwise lgtm. |
do we have sufficient asv's for this? |
1ee1d93
to
6bdbb5d
Compare
There were no asv's for this. However, if you run my code snippet above, there is a huge spike in RAM usage, when run in the old version. I've even gotten a few So my ASV is done using only Also, Series.is_monotonic* wasn't added untill 0.19. should that be put in a try/except clause, to avoid failing on older versions of pandas? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor comment on the asv. its ok if it fails under 0.19, that's pretty far back now
asv_bench/benchmarks/categoricals.py
Outdated
self.c = pd.CategoricalIndex(list('a'*N + 'b'*N + 'c'*N)) | ||
self.s = pd.Series(self.c) | ||
|
||
def time_categorical_index_is_monotonic(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these shouldn't be in the same asv, you can do this with params I think
6bdbb5d
to
6d4aea9
Compare
Hello @topper-123! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on May 16, 2018 at 19:04 Hours UTC |
6d4aea9
to
2e34678
Compare
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -1079,6 +1079,7 @@ Performance Improvements | |||
- Improved performance of :func:`pandas.core.groupby.GroupBy.pct_change` (:issue:`19165`) | |||
- Improved performance of :func:`Series.isin` in the case of categorical dtypes (:issue:`20003`) | |||
- Improved performance of ``getattr(Series, attr)`` when the Series has certain index types. This manifiested in slow printing of large Series with a ``DatetimeIndex`` (:issue:`19764`) | |||
- Improved performance of :meth:`CategoricalIndex.is_monotonic_increasing`, :meth:`CategoricalIndex.is_monotonic_decreasing` and :meth:`CategoricalIndex.is_monotonic` (:issue:`21025`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will need to be in 0.23.1 (not yet in repo, soon)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to 0.23.1.
2e34678
to
c815d62
Compare
thanks @topper-123 |
) (cherry picked from commit 1ee5ecf)
(cherry picked from commit 1ee5ecf)
git diff upstream/master -u -- "*.py" | flake8 --diff
There seem to be a few more like this, where
CategoricalIndex
should useself._engine
but doesn't.@TomAugspurger?