PERF: improved performance of CategoricalIndex.is_monotonic* #21025

topper-123 · 2018-05-14T00:16:10Z

passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

>>> n = 1000000
>>> ci = pd.CategoricalIndex(list('a' * n + 'b' * n + 'c' * n))
>>> %t ci.is_monotonic_increasing
22 ms # v0.22 and master
227 ns  # this commit

There seem to be a few more like this, where CategoricalIndex should use self._engine but doesn't.

@TomAugspurger?

jreback · 2018-05-14T00:23:05Z

this hit the same code path; so check this

topper-123 · 2018-05-14T00:36:20Z

Not sure I follow, but these two versions do not follow the same code path, as the old version required creating a new Int64Index which is expensive.

CategoricalIndex.is_monotonic is already tested in indexes/test_category.py::TestCategoricalIndex::test_is_monotonic.

codecov · 2018-05-14T01:20:37Z

Codecov Report

Merging #21025 into master will not change coverage.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #21025   +/-   ##
=======================================
  Coverage   91.83%   91.83%           
=======================================
  Files         153      153           
  Lines       49495    49495           
=======================================
  Hits        45454    45454           
  Misses       4041     4041

Flag	Coverage Δ
#multiple	`90.23% <100%> (ø)`	⬆️
#single	`41.88% <0%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/indexes/category.py	`97.03% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 501f041...c815d62. Read the comment docs.

jreback · 2018-05-14T12:32:35Z

can you add additional tests using strings (and not just integers) in that same test. otherwise lgtm.

jreback · 2018-05-14T12:32:57Z

do we have sufficient asv's for this?

topper-123 · 2018-05-14T17:10:06Z

There were no asv's for this. However, if you run my code snippet above, there is a huge spike in RAM usage, when run in the old version. I've even gotten a few MemoryErrors.

So my ASV is done using only N = 1000 to limit memory usage. The result is here 60 microseconds (old version) vs 260 ns (new version).

Also, Series.is_monotonic* wasn't added untill 0.19. should that be put in a try/except clause, to avoid failing on older versions of pandas?

jreback

minor comment on the asv. its ok if it fails under 0.19, that's pretty far back now

jreback · 2018-05-14T23:58:49Z

asv_bench/benchmarks/categoricals.py

+        self.c = pd.CategoricalIndex(list('a'*N + 'b'*N + 'c'*N))
+        self.s = pd.Series(self.c)
+
+    def time_categorical_index_is_monotonic(self):


these shouldn't be in the same asv, you can do this with params I think

pep8speaks · 2018-05-15T16:15:39Z

Hello @topper-123! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on May 16, 2018 at 19:04 Hours UTC

jreback · 2018-05-15T23:43:46Z

doc/source/whatsnew/v0.23.0.txt

@@ -1079,6 +1079,7 @@ Performance Improvements
 - Improved performance of :func:`pandas.core.groupby.GroupBy.pct_change` (:issue:`19165`)
 - Improved performance of :func:`Series.isin` in the case of categorical dtypes (:issue:`20003`)
 - Improved performance of ``getattr(Series, attr)`` when the Series has certain index types. This manifiested in slow printing of large Series with a ``DatetimeIndex`` (:issue:`19764`)
+- Improved performance of :meth:`CategoricalIndex.is_monotonic_increasing`, :meth:`CategoricalIndex.is_monotonic_decreasing` and :meth:`CategoricalIndex.is_monotonic` (:issue:`21025`)


will need to be in 0.23.1 (not yet in repo, soon)

Moved to 0.23.1.

jreback · 2018-05-17T00:21:58Z

thanks @topper-123

) (cherry picked from commit 1ee5ecf)

(cherry picked from commit 1ee5ecf)

topper-123 force-pushed the is_monotonic_perf branch from b7f6e04 to a775186 Compare May 14, 2018 00:17

topper-123 force-pushed the is_monotonic_perf branch from a775186 to 3378b3a Compare May 14, 2018 00:27

jreback added Performance Memory or execution speed performance Categorical Categorical Data Type labels May 14, 2018

topper-123 force-pushed the is_monotonic_perf branch 2 times, most recently from 1ee1d93 to 6bdbb5d Compare May 14, 2018 17:06

jreback reviewed May 14, 2018

View reviewed changes

topper-123 force-pushed the is_monotonic_perf branch from 6bdbb5d to 6d4aea9 Compare May 15, 2018 16:15

topper-123 force-pushed the is_monotonic_perf branch from 6d4aea9 to 2e34678 Compare May 15, 2018 16:19

jreback requested changes May 15, 2018

View reviewed changes

jreback added this to the 0.23.1 milestone May 15, 2018

improved performance of CategoricalIndex.is_monotonic*

c815d62

topper-123 force-pushed the is_monotonic_perf branch from 2e34678 to c815d62 Compare May 16, 2018 19:04

jreback approved these changes May 17, 2018

View reviewed changes

jreback merged commit 1ee5ecf into pandas-dev:master May 17, 2018

jreback added the Needs Backport label May 17, 2018

topper-123 deleted the is_monotonic_perf branch May 21, 2018 21:00

jorisvandenbossche removed the Needs Backport label Jun 8, 2018

jorisvandenbossche pushed a commit to jorisvandenbossche/pandas that referenced this pull request Jun 8, 2018

improved performance of CategoricalIndex.is_monotonic* (pandas-dev#21025

57f6f45

) (cherry picked from commit 1ee5ecf)

fjetter mentioned this pull request Jun 9, 2018

PERF: __contains__ method for Categorical #21022

Closed

4 tasks

jorisvandenbossche pushed a commit that referenced this pull request Jun 9, 2018

improved performance of CategoricalIndex.is_monotonic* (#21025)

e469400

(cherry picked from commit 1ee5ecf)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF: improved performance of CategoricalIndex.is_monotonic* #21025

PERF: improved performance of CategoricalIndex.is_monotonic* #21025

Uh oh!

topper-123 commented May 14, 2018 •

edited

Loading

Uh oh!

jreback commented May 14, 2018

Uh oh!

topper-123 commented May 14, 2018

Uh oh!

codecov bot commented May 14, 2018 •

edited

Loading

Uh oh!

jreback commented May 14, 2018

Uh oh!

jreback commented May 14, 2018

Uh oh!

topper-123 commented May 14, 2018

Uh oh!

jreback left a comment

Uh oh!

jreback May 14, 2018

Uh oh!

pep8speaks commented May 15, 2018 •

edited

Loading

Uh oh!

jreback May 15, 2018

Uh oh!

topper-123 May 16, 2018

Uh oh!

jreback commented May 17, 2018

Uh oh!

Uh oh!

Uh oh!

PERF: improved performance of CategoricalIndex.is_monotonic* #21025

PERF: improved performance of CategoricalIndex.is_monotonic* #21025

Uh oh!

Conversation

topper-123 commented May 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback commented May 14, 2018

Uh oh!

topper-123 commented May 14, 2018

Uh oh!

codecov bot commented May 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jreback commented May 14, 2018

Uh oh!

jreback commented May 14, 2018

Uh oh!

topper-123 commented May 14, 2018

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

jreback May 14, 2018

Choose a reason for hiding this comment

Uh oh!

pep8speaks commented May 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on May 16, 2018 at 19:04 Hours UTC

Uh oh!

jreback May 15, 2018

Choose a reason for hiding this comment

Uh oh!

topper-123 May 16, 2018

Choose a reason for hiding this comment

Uh oh!

jreback commented May 17, 2018

Uh oh!

Uh oh!

topper-123 commented May 14, 2018 •

edited

Loading

codecov bot commented May 14, 2018 •

edited

Loading

pep8speaks commented May 15, 2018 •

edited

Loading