Skip to content

Commit c7cc22a

Browse files
authored
PERF: DataFrame.join with how=left|right and sort=True (#56919)
* use argsort indexer * whatsnew * mypy
1 parent 46c4763 commit c7cc22a

File tree

2 files changed

+15
-2
lines changed

2 files changed

+15
-2
lines changed

doc/source/whatsnew/v2.3.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,7 @@ Performance improvements
103103
~~~~~~~~~~~~~~~~~~~~~~~~
104104
- Performance improvement in :meth:`DataFrame.join` for sorted but non-unique indexes (:issue:`56941`)
105105
- Performance improvement in :meth:`DataFrame.join` when left and/or right are non-unique and ``how`` is ``"left"``, ``"right"``, or ``"inner"`` (:issue:`56817`)
106+
- Performance improvement in :meth:`DataFrame.join` with ``how="left"`` or ``how="right"`` and ``sort=True`` (:issue:`56919`)
106107
- Performance improvement in :meth:`DataFrameGroupBy.ffill`, :meth:`DataFrameGroupBy.bfill`, :meth:`SeriesGroupBy.ffill`, and :meth:`SeriesGroupBy.bfill` (:issue:`56902`)
107108
- Performance improvement in :meth:`Index.take` when ``indices`` is a full range indexer from zero to length of index (:issue:`56806`)
108109
-

pandas/core/indexes/base.py

+14-2
Original file line numberDiff line numberDiff line change
@@ -4682,11 +4682,23 @@ def _join_via_get_indexer(
46824682
# uniqueness/monotonicity
46834683

46844684
# Note: at this point we have checked matching dtypes
4685+
lindexer: npt.NDArray[np.intp] | None
4686+
rindexer: npt.NDArray[np.intp] | None
46854687

46864688
if how == "left":
4687-
join_index = self.sort_values() if sort else self
4689+
if sort:
4690+
join_index, lindexer = self.sort_values(return_indexer=True)
4691+
rindexer = other.get_indexer_for(join_index)
4692+
return join_index, lindexer, rindexer
4693+
else:
4694+
join_index = self
46884695
elif how == "right":
4689-
join_index = other.sort_values() if sort else other
4696+
if sort:
4697+
join_index, rindexer = other.sort_values(return_indexer=True)
4698+
lindexer = self.get_indexer_for(join_index)
4699+
return join_index, lindexer, rindexer
4700+
else:
4701+
join_index = other
46904702
elif how == "inner":
46914703
join_index = self.intersection(other, sort=sort)
46924704
elif how == "outer":

0 commit comments

Comments
 (0)