Skip to content

Commit 107bef8

Browse files
lukemanleypmhatre1
authored andcommitted
PERF: Index.join to maintain cached attributes in more cases (pandas-dev#57023)
* Index.join result name * whatsnew * update test * Index._wrap_join_result to maintain cached attributes if possible * Index._wrap_join_result to maintain cached attributes if possible * whatsnew * allow indexers to be None * gh ref * rename variables for clarity
1 parent b8d6799 commit 107bef8

File tree

3 files changed

+10
-8
lines changed

3 files changed

+10
-8
lines changed

doc/source/whatsnew/v3.0.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -105,6 +105,7 @@ Performance improvements
105105
- Performance improvement in :meth:`DataFrame.join` when left and/or right are non-unique and ``how`` is ``"left"``, ``"right"``, or ``"inner"`` (:issue:`56817`)
106106
- Performance improvement in :meth:`DataFrame.join` with ``how="left"`` or ``how="right"`` and ``sort=True`` (:issue:`56919`)
107107
- Performance improvement in :meth:`DataFrameGroupBy.ffill`, :meth:`DataFrameGroupBy.bfill`, :meth:`SeriesGroupBy.ffill`, and :meth:`SeriesGroupBy.bfill` (:issue:`56902`)
108+
- Performance improvement in :meth:`Index.join` by propagating cached attributes in cases where the result matches one of the inputs (:issue:`57023`)
108109
- Performance improvement in :meth:`Index.take` when ``indices`` is a full range indexer from zero to length of index (:issue:`56806`)
109110
- Performance improvement in :meth:`MultiIndex.equals` for equal length indexes (:issue:`56990`)
110111
- Performance improvement in indexing operations for string dtypes (:issue:`56997`)

pandas/core/frame.py

+5-7
Original file line numberDiff line numberDiff line change
@@ -8012,19 +8012,17 @@ def _arith_method_with_reindex(self, right: DataFrame, op) -> DataFrame:
80128012
left = self
80138013

80148014
# GH#31623, only operate on shared columns
8015-
cols, lcols, rcols = left.columns.join(
8016-
right.columns, how="inner", level=None, return_indexers=True
8015+
cols, lcol_indexer, rcol_indexer = left.columns.join(
8016+
right.columns, how="inner", return_indexers=True
80178017
)
80188018

8019-
new_left = left.iloc[:, lcols]
8020-
new_right = right.iloc[:, rcols]
8019+
new_left = left if lcol_indexer is None else left.iloc[:, lcol_indexer]
8020+
new_right = right if rcol_indexer is None else right.iloc[:, rcol_indexer]
80218021
result = op(new_left, new_right)
80228022

80238023
# Do the join on the columns instead of using left._align_for_op
80248024
# to avoid constructing two potentially large/sparse DataFrames
8025-
join_columns, _, _ = left.columns.join(
8026-
right.columns, how="outer", level=None, return_indexers=True
8027-
)
8025+
join_columns = left.columns.join(right.columns, how="outer")
80288026

80298027
if result.columns.has_duplicates:
80308028
# Avoid reindexing with a duplicate axis.

pandas/tests/indexes/multi/test_join.py

+4-1
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,10 @@ def test_join_level(idx, other, join_type):
3535

3636
assert join_index.equals(join_index2)
3737
tm.assert_numpy_array_equal(lidx, lidx2)
38-
tm.assert_numpy_array_equal(ridx, ridx2)
38+
if ridx is None:
39+
assert ridx == ridx2
40+
else:
41+
tm.assert_numpy_array_equal(ridx, ridx2)
3942
tm.assert_numpy_array_equal(join_index2.values, exp_values)
4043

4144

0 commit comments

Comments
 (0)