Skip to content

PERF: avoid unnecessary reordering in MultiIndex._reorder_indexer #48384

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Sep 14, 2022
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.6.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@ Performance improvements
- Performance improvement in :meth:`.GroupBy.mean` and :meth:`.GroupBy.var` for extension array dtypes (:issue:`37493`)
- Performance improvement for :meth:`Series.value_counts` with nullable dtype (:issue:`48338`)
- Performance improvement for :class:`Series` constructor passing integer numpy array with nullable dtype (:issue:`48338`)
- Performance improvement in :meth:`DataFrame.loc` and :meth:`Series.loc` for tuple-based indexing of a :class:`MultiIndex` (:issue:`48384`)
- Performance improvement for :meth:`MultiIndex.unique` (:issue:`48335`)
-

Expand Down
43 changes: 28 additions & 15 deletions pandas/core/indexes/multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -3466,22 +3466,35 @@ def _reorder_indexer(
-------
indexer : a sorted position indexer of self ordered as seq
"""
# If the index is lexsorted and the list_like label in seq are sorted
# then we do not need to sort
if self._is_lexsorted():
need_sort = False
for i, k in enumerate(seq):
if is_list_like(k):
if not need_sort:
k_codes = self.levels[i].get_indexer(k)
k_codes = k_codes[k_codes >= 0] # Filter absent keys
# True if the given codes are not ordered
need_sort = (k_codes[:-1] > k_codes[1:]).any()
elif isinstance(k, slice) and k.step is not None and k.step < 0:

# check if sorting is necessary
need_sort = False
for i, k in enumerate(seq):
if com.is_null_slice(k) or com.is_bool_indexer(k) or is_scalar(k):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have unit tests for all these new branches that short circuit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty there are tests that hit these in one way or another. Regardless, I just added test_get_locs_reordering that explicitly tests the ordering effects of: null slices, bool indexers, scalars, and lists of length 1.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great thanks!

pass
elif is_list_like(k):
if len(k) <= 1: # type: ignore[arg-type]
pass
elif self._is_lexsorted():
# If the index is lexsorted and the list_like label
# in seq are sorted then we do not need to sort
k_codes = self.levels[i].get_indexer(k)
k_codes = k_codes[k_codes >= 0] # Filter absent keys
# True if the given codes are not ordered
need_sort = (k_codes[:-1] > k_codes[1:]).any()
else:
need_sort = True
# Bail out if both index and seq are sorted
if not need_sort:
return indexer
elif isinstance(k, slice):
if self._is_lexsorted():
need_sort = k.step is not None and k.step < 0
else:
need_sort = True
else:
need_sort = True
if need_sort:
break
if not need_sort:
return indexer

n = len(self)
keys: tuple[np.ndarray, ...] = ()
Expand Down
26 changes: 26 additions & 0 deletions pandas/tests/indexes/multi/test_indexing.py
Original file line number Diff line number Diff line change
Expand Up @@ -898,3 +898,29 @@ def test_pyint_engine():
missing = tuple([0, 1] * 5 * N)
result = index.get_indexer([missing] + [keys[i] for i in idces])
tm.assert_numpy_array_equal(result, expected)


@pytest.mark.parametrize(
"keys,expected",
[
((slice(None), [5, 4]), [1, 0]),
((slice(None), [4, 5]), [0, 1]),
(([True, False, True], [4, 6]), [0, 2]),
(([True, False, True], [6, 4]), [0, 2]),
((2, [4, 5]), [0, 1]),
((2, [5, 4]), [1, 0]),
(([2], [4, 5]), [0, 1]),
(([2], [5, 4]), [1, 0]),
],
)
def test_get_locs_reordering(keys, expected):
# GH48384
idx = MultiIndex.from_arrays(
[
[2, 2, 1],
[4, 5, 6],
]
)
result = idx.get_locs(keys)
expected = np.array(expected)
tm.assert_numpy_array_equal(result, expected)