Skip to content

PERF: Regression in Series.is_monotonic_increasing for categorical #33365

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
TomAugspurger opened this issue Apr 7, 2020 · 1 comment · Fixed by #33540
Closed
3 tasks done

PERF: Regression in Series.is_monotonic_increasing for categorical #33365

TomAugspurger opened this issue Apr 7, 2020 · 1 comment · Fixed by #33540
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Categorical Categorical Data Type Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@TomAugspurger
Copy link
Contributor

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

Pandas 1.0.1

In [2]: N = 1000

In [3]: c = pd.CategoricalIndex(list("a" * N + "b" * N + "c" * N))

In [4]: s = pd.Series(c)

In [5]: %timeit s.is_monotonic_increasing
35.3 µs ± 663 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Master

In [5]: %timeit s.is_monotonic_increasing
73.6 µs ± 782 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Problem description

https://pandas.pydata.org/speed/pandas/#categoricals.IsMonotonic.time_categorical_series_is_monotonic_increasing

@TomAugspurger TomAugspurger added Performance Memory or execution speed performance Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Categorical Categorical Data Type labels Apr 7, 2020
@TomAugspurger TomAugspurger added this to the 1.1 milestone Apr 7, 2020
@TomAugspurger TomAugspurger added the Regression Functionality that used to work in a prior pandas version label Apr 7, 2020
@rtlee9
Copy link
Contributor

rtlee9 commented Apr 14, 2020

Looks like the regression was introduced in 2fc8559.

$ asv continuous -f 1.1 2fc85593 -b 'categoricals.IsMonotonic.time_categorical_series_is_monotonic_increasing'

before           after         ratio
     [7c5d3d52]       [2fc85593]
     <master~660>       <master~659>
+      61.8±0.8μs         99.3±4μs     1.61  categoricals.IsMonotonic.time_categorical_series_is_mono
tonic_increasing

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE DECREASED.

There are a few ways to resolve, please see #33540 for one solution

rtlee9 added a commit to rtlee9/pandas that referenced this issue Apr 14, 2020
Fix performance regression in Series.is_monotonic_increasing for categorical
by avoiding Categorical construction for categorical series
rtlee9 added a commit to rtlee9/pandas that referenced this issue Apr 14, 2020
Fix performance regression in Series.is_monotonic_increasing for categorical
by avoiding Categorical construction for categorical series
jreback pushed a commit that referenced this issue Apr 15, 2020
Fix performance regression in Series.is_monotonic_increasing for categorical
by avoiding Categorical construction for categorical series
CloseChoice pushed a commit to CloseChoice/pandas that referenced this issue Apr 20, 2020
Fix performance regression in Series.is_monotonic_increasing for categorical
by avoiding Categorical construction for categorical series
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Categorical Categorical Data Type Performance Memory or execution speed performance Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants