PERF: improve perf. of Categorical.searchsorted #28795

topper-123 · 2019-10-04T21:28:34Z

Improves performance of Categorical.searchsorted by avoiding expensive data convertions.

>>> n = 100_000
>>> c = pd.Categorical(['a'] * n + ['b'] * n + ['c'] * n)
>>> %timeit c.searchsorted('b')
259 µs ± 2.95 µs per loop  # master
5.5 µs ± 165 ns per loop  # this PR
>>> %timeit c.searchsorted(['b', 'c'])
240 µs ± 4.24 µs per loop  # master
9.9 µs ± 166 ns per loop  # this PR

Also, CategoricalIndex.searchsorted now calls self.values.searchsorted directly instead of going through algorithms.searchsorted, which always ends up calling self.values.searchsorted anyway. This ends up getting performance to 5.5 µs instead of 12 µs.

gfyoung · 2019-10-05T03:47:16Z

doc/source/whatsnew/v1.0.0.rst

@@ -162,6 +162,7 @@ Performance improvements
 - Performance improvement in :meth:`DataFrame.corr` when ``method`` is ``"spearman"`` (:issue:`28139`)
 - Performance improvement in :meth:`DataFrame.replace` when provided a list of values to replace (:issue:`28099`)
 - Performance improvement in :meth:`DataFrame.select_dtypes` by using vectorization instead of iterating over a loop (:issue:`28317`)
+- Performance improvement in :meth:`Categorical.searchsorted` and  :meth:`CategoricalIndex.searchsorted` when searching for a single scalar value (:issue:`XXXXX`)


Just reference the PR as the issue

Yeah, fixed.

jreback

lgtm, small comment, ping on green.

jreback · 2019-10-05T22:29:07Z

pandas/core/arrays/categorical.py

-
-        codes = codes[0] if is_scalar(value) else codes
-
+        if is_scalar(value):


lgtm, i would add a comment here that this is perf sensitive

topper-123 · 2019-10-06T08:52:39Z

Comments addressed.

jreback · 2019-10-06T22:13:33Z

thanks @topper-123

PERF: improve perf. of Categorical.searchesorted

00f1736

topper-123 added Performance Memory or execution speed performance Categorical Categorical Data Type labels Oct 4, 2019

gfyoung reviewed Oct 5, 2019

View reviewed changes

added PR number

27bd6f7

topper-123 force-pushed the Categorical.searchsorted_II branch from 0f46d60 to 27bd6f7 Compare October 5, 2019 08:58

jreback added this to the 1.0 milestone Oct 5, 2019

jreback approved these changes Oct 5, 2019

View reviewed changes

Minor explanation on perf.

2f4a9ab

topper-123 changed the title ~~PERF: improve perf. of Categorical.searchesorted~~ PERF: improve perf. of Categorical.searchsorted Oct 6, 2019

jreback merged commit 66918d0 into pandas-dev:master Oct 6, 2019

topper-123 deleted the Categorical.searchsorted_II branch October 6, 2019 22:33

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019

PERF: improve perf. of Categorical.searchsorted (pandas-dev#28795)

3d61f98

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019

PERF: improve perf. of Categorical.searchsorted (pandas-dev#28795)

4378c0c

bongolegend pushed a commit to bongolegend/pandas that referenced this pull request Jan 1, 2020

PERF: improve perf. of Categorical.searchsorted (pandas-dev#28795)

6344bb8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF: improve perf. of Categorical.searchsorted #28795

PERF: improve perf. of Categorical.searchsorted #28795

topper-123 commented Oct 4, 2019 •

edited

Loading

gfyoung Oct 5, 2019

topper-123 Oct 5, 2019

jreback left a comment

jreback Oct 5, 2019

topper-123 commented Oct 6, 2019

jreback commented Oct 6, 2019


		codes = codes[0] if is_scalar(value) else codes

		if is_scalar(value):

PERF: improve perf. of Categorical.searchsorted #28795

PERF: improve perf. of Categorical.searchsorted #28795

Conversation

topper-123 commented Oct 4, 2019 • edited Loading

gfyoung Oct 5, 2019

Choose a reason for hiding this comment

topper-123 Oct 5, 2019

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

jreback Oct 5, 2019

Choose a reason for hiding this comment

topper-123 commented Oct 6, 2019

jreback commented Oct 6, 2019

topper-123 commented Oct 4, 2019 •

edited

Loading