Skip to content

Commit f2b3795

Browse files
authored
PERF: join on unordered CategoricalIndex (#56345)
* join on unordered categorical index perf * whatsnew * add back try/except * remove unused method
1 parent 7808ecf commit f2b3795

File tree

2 files changed

+11
-4
lines changed

2 files changed

+11
-4
lines changed

doc/source/whatsnew/v2.2.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -480,6 +480,7 @@ Performance improvements
480480
- Performance improvement in :func:`merge_asof` when ``by`` is not ``None`` (:issue:`55580`, :issue:`55678`)
481481
- Performance improvement in :func:`read_stata` for files with many variables (:issue:`55515`)
482482
- Performance improvement in :meth:`DataFrame.groupby` when aggregating pyarrow timestamp and duration dtypes (:issue:`55031`)
483+
- Performance improvement in :meth:`DataFrame.join` when joining on unordered categorical indexes (:issue:`56345`)
483484
- Performance improvement in :meth:`DataFrame.loc` and :meth:`Series.loc` when indexing with a :class:`MultiIndex` (:issue:`56062`)
484485
- Performance improvement in :meth:`DataFrame.sort_index` and :meth:`Series.sort_index` when indexed by a :class:`MultiIndex` (:issue:`54835`)
485486
- Performance improvement in :meth:`DataFrame.to_dict` on converting DataFrame to dictionary (:issue:`50990`)

pandas/core/indexes/base.py

+10-4
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,7 @@
123123
SparseDtype,
124124
)
125125
from pandas.core.dtypes.generic import (
126+
ABCCategoricalIndex,
126127
ABCDataFrame,
127128
ABCDatetimeIndex,
128129
ABCIntervalIndex,
@@ -4614,19 +4615,24 @@ def join(
46144615
this = self.astype(dtype, copy=False)
46154616
other = other.astype(dtype, copy=False)
46164617
return this.join(other, how=how, return_indexers=True)
4618+
elif (
4619+
isinstance(self, ABCCategoricalIndex)
4620+
and isinstance(other, ABCCategoricalIndex)
4621+
and not self.ordered
4622+
and not self.categories.equals(other.categories)
4623+
):
4624+
# dtypes are "equal" but categories are in different order
4625+
other = Index(other._values.reorder_categories(self.categories))
46174626

46184627
_validate_join_method(how)
46194628

46204629
if (
4621-
not isinstance(self.dtype, CategoricalDtype)
4622-
and self.is_monotonic_increasing
4630+
self.is_monotonic_increasing
46234631
and other.is_monotonic_increasing
46244632
and self._can_use_libjoin
46254633
and other._can_use_libjoin
46264634
and (self.is_unique or other.is_unique)
46274635
):
4628-
# Categorical is monotonic if data are ordered as categories, but join can
4629-
# not handle this in case of not lexicographically monotonic GH#38502
46304636
try:
46314637
return self._join_monotonic(other, how=how)
46324638
except TypeError:

0 commit comments

Comments
 (0)