PERF: regression in CategoricalIndex.get_indexer #42249
Labels
Categorical
Categorical Data Type
Performance
Memory or execution speed performance
Regression
Functionality that used to work in a prior pandas version
Milestone
The fix in #42089 (or caused by the PR that this one was fixing) seems to have caused a large slowdown on the
get_indexer
benchmarks: https://pandas.pydata.org/speed/pandas/#indexing.CategoricalIndexIndexing.time_get_indexer_list?python=3.8&Cython=0.29.21&p-index='monotonic_incr'&commits=cf5852bf-fce7f9ebThe regression overview (https://pandas.pydata.org/speed/pandas/#regressions?sort=1&dir=desc) lists it as a 1000x slowdown, but that's only because #42042 first improved the performance a lot (which might be a bit suspicious?). Compared to the timing before that, it's only 4-5x slowdown. With the below code, I see locally a ~9x slowdown on master compared to 1.2.5.
I think it has to do with the fact that before we called the Engine.get_indexer on the codes, while now in the base class version we do that with the
.categories
, which means in this case that bothself
andtarget
are cast to object dtype and thus use the Engine.get_indexer for object dtype.Originally posted by @jorisvandenbossche in #42089 (comment)
The text was updated successfully, but these errors were encountered: