BUG: RecursionError with CategoricalIndex.get_indexer #42089

jbrockmendel · 2021-06-17T20:38:12Z

closes CI: categorical failures #42088
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

jreback · 2021-06-17T20:56:48Z

cool. merge & backport on green.

jreback · 2021-06-17T21:34:40Z

@meeseeksdev backport 1.3.x

…ex.get_indexer

lumberbot-app · 2021-06-17T21:34:49Z

Something went wrong ... Please have a look at my logs.

…exer (#42093) Co-authored-by: jbrockmendel <[email protected]>

jorisvandenbossche · 2021-06-26T08:45:34Z

This fix seems to have caused a large slowdown on the get_indexer benchmarks: https://pandas.pydata.org/speed/pandas/#indexing.CategoricalIndexIndexing.time_get_indexer_list?python=3.8&Cython=0.29.21&p-index='monotonic_incr'&commits=cf5852bf-fce7f9eb

The regression overview (https://pandas.pydata.org/speed/pandas/#regressions?sort=1&dir=desc) lists it as a 1000x slowdown, but that's only because #42042 first improved the performance a lot (which might be a bit suspicious?). Compared to the timing before that, it's only 4-5x slowdown. With the below code, I see locally a ~9x slowdown on master compared to 1.2.5.

import string, itertools
data_unique = pd.CategoricalIndex(
            ["".join(perm) for perm in itertools.permutations(string.printable, 3)]
)
cat_list = ["a", "c"]

%timeit data_unique.get_indexer(cat_list)
52.8 ms ± 5.56 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)  # <-- pandas 1.2.5
417 ms ± 22.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)  # <-- master

I think it has to do with the fact that before we called the Engine.get_indexer on the codes, while now in the base class version we do that with the .categories, which means in this case that both self and target are cast to object dtype and thus use the Engine.get_indexer for object dtype.

jorisvandenbossche · 2021-06-26T08:47:19Z

I opened #42249 to keep track of this.

BUG: RecursionError with CategoricalIndex.get_indexer

33217e8

jreback added the Categorical Categorical Data Type label Jun 17, 2021

jreback added this to the 1.3 milestone Jun 17, 2021

jreback merged commit 1a3daf4 into pandas-dev:master Jun 17, 2021

meeseeksmachine mentioned this pull request Jun 17, 2021

Backport PR #42089 on branch 1.3.x (BUG: RecursionError with CategoricalIndex.get_indexer) #42093

Merged

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Jun 17, 2021

Backport PR pandas-dev#42089: BUG: RecursionError with CategoricalInd…

de81870

…ex.get_indexer

jbrockmendel deleted the bug-cat-get_idnexer branch June 17, 2021 21:58

jreback pushed a commit that referenced this pull request Jun 17, 2021

Backport PR #42089: BUG: RecursionError with CategoricalIndex.get_ind…

44f2649

…exer (#42093) Co-authored-by: jbrockmendel <[email protected]>

jorisvandenbossche mentioned this pull request Jun 26, 2021

PERF: regression in CategoricalIndex.get_indexer #42249

Closed

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021

BUG: RecursionError with CategoricalIndex.get_indexer (pandas-dev#42089)

19f672c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: RecursionError with CategoricalIndex.get_indexer #42089

BUG: RecursionError with CategoricalIndex.get_indexer #42089

jbrockmendel commented Jun 17, 2021

jreback commented Jun 17, 2021

jreback commented Jun 17, 2021

lumberbot-app bot commented Jun 17, 2021

jorisvandenbossche commented Jun 26, 2021

jorisvandenbossche commented Jun 26, 2021

BUG: RecursionError with CategoricalIndex.get_indexer #42089

BUG: RecursionError with CategoricalIndex.get_indexer #42089

Conversation

jbrockmendel commented Jun 17, 2021

jreback commented Jun 17, 2021

jreback commented Jun 17, 2021

lumberbot-app bot commented Jun 17, 2021

jorisvandenbossche commented Jun 26, 2021

jorisvandenbossche commented Jun 26, 2021