-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: RecursionError with CategoricalIndex.get_indexer #42089
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jbrockmendel
commented
Jun 17, 2021
- closes CI: categorical failures #42088
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
cool. merge & backport on green. |
@meeseeksdev backport 1.3.x |
Something went wrong ... Please have a look at my logs. |
…exer (#42093) Co-authored-by: jbrockmendel <[email protected]>
This fix seems to have caused a large slowdown on the The regression overview (https://pandas.pydata.org/speed/pandas/#regressions?sort=1&dir=desc) lists it as a 1000x slowdown, but that's only because #42042 first improved the performance a lot (which might be a bit suspicious?). Compared to the timing before that, it's only 4-5x slowdown. With the below code, I see locally a ~9x slowdown on master compared to 1.2.5. import string, itertools
data_unique = pd.CategoricalIndex(
["".join(perm) for perm in itertools.permutations(string.printable, 3)]
)
cat_list = ["a", "c"]
%timeit data_unique.get_indexer(cat_list)
52.8 ms ± 5.56 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) # <-- pandas 1.2.5
417 ms ± 22.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) # <-- master I think it has to do with the fact that before we called the Engine.get_indexer on the codes, while now in the base class version we do that with the |
I opened #42249 to keep track of this. |