Skip to content

BUG: assert_index_equal ignoring names when check_order is false #47330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 23, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.5.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -935,7 +935,7 @@ Other

.. ***DO NOT USE THIS SECTION***

-
- Bug in :func:`.assert_index_equal` with ``names=True`` and ``check_order=False`` not checking names not (:issue:`47328`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DO NOT USE THIS SECTION

yep. this is ok here, in previous release notes the assert_* always go in the other section.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checked 1.3 notes to be sure there

-

.. ---------------------------------------------------------------------------
Expand Down
10 changes: 4 additions & 6 deletions pandas/_testing/asserters.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,10 +45,7 @@
Series,
TimedeltaIndex,
)
from pandas.core.algorithms import (
safe_sort,
take_nd,
)
from pandas.core.algorithms import take_nd
from pandas.core.arrays import (
DatetimeArray,
ExtensionArray,
Expand All @@ -58,6 +55,7 @@
)
from pandas.core.arrays.datetimelike import DatetimeLikeArrayMixin
from pandas.core.arrays.string_ import StringDtype
from pandas.core.indexes.api import safe_sort_index

from pandas.io.formats.printing import pprint_thing

Expand Down Expand Up @@ -367,8 +365,8 @@ def _get_ilevel_values(index, level):

# If order doesn't matter then sort the index entries
if not check_order:
left = Index(safe_sort(left))
right = Index(safe_sort(right))
left = safe_sort_index(left)
right = safe_sort_index(right)

# MultiIndex special comparison for little-friendly error messages
if left.nlevels > 1:
Expand Down
41 changes: 31 additions & 10 deletions pandas/core/indexes/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,7 @@
"get_unanimous_names",
"all_indexes_same",
"default_index",
"safe_sort_index",
]


Expand Down Expand Up @@ -157,23 +158,43 @@ def _get_combined_index(
index = ensure_index(index)

if sort:
try:
array_sorted = safe_sort(index)
array_sorted = cast(np.ndarray, array_sorted)
if isinstance(index, MultiIndex):
index = MultiIndex.from_tuples(array_sorted, names=index.names)
else:
index = Index(array_sorted, name=index.name)
except TypeError:
pass

index = safe_sort_index(index)
# GH 29879
if copy:
index = index.copy()

return index


def safe_sort_index(index: Index) -> Index:
"""
Returns the sorted index

We keep the dtypes and the name attributes.

Parameters
----------
index : an Index

Returns
-------
Index
"""
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

following on from #47329 (comment)

is our use of safe_sort here, actually safe? i.e. we don't reach any of the cases that raise in safe_sort) and the try is not needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None inside a mixed index still raises a TypeError

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets adressed in #47331, we can probably remove the try there

array_sorted = safe_sort(index)
array_sorted = cast(np.ndarray, array_sorted)
if isinstance(index, MultiIndex):
index = MultiIndex.from_tuples(
array_sorted, names=index.names, dtypes=index.dtypes
)
else:
index = Index(array_sorted, name=index.name, dtype=index.dtype)
except TypeError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe only have the safe_sort call in the try block?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit tricky, since array_sorted does not exist in the except block and we don't want to caste the Index again to an Index?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with an else block we would only cast if the safe_sort succeeded? (i.e. Index values are unique)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much. I am always forgetting that...

Changed the code

pass

return index


def union_indexes(indexes, sort: bool | None = True) -> Index:
"""
Return the union of indexes.
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/util/test_assert_index_equal.py
Original file line number Diff line number Diff line change
Expand Up @@ -238,6 +238,14 @@ def test_index_equal_range_categories(check_categorical, exact):
)


def test_assert_index_equal_different_names_check_order_false():
# GH#47328
idx1 = Index([1, 3], name="a")
idx2 = Index([3, 1], name="b")
with pytest.raises(AssertionError, match='"names" are different'):
tm.assert_index_equal(idx1, idx2, check_order=False, check_names=True)


def test_assert_index_equal_mixed_dtype():
# GH#39168
idx = Index(["foo", "bar", 42])
Expand Down