Skip to content

CLN: Comparison methods for MultiIndex should have consistent behaviour for all nlevels (GH21149) #21195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jun 14, 2018
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion doc/source/whatsnew/v0.23.1.txt
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,8 @@ Indexing

- Bug in :meth:`Series.reset_index` where appropriate error was not raised with an invalid level name (:issue:`20925`)
- Bug in :func:`interval_range` when ``start``/``periods`` or ``end``/``periods`` are specified with float ``start`` or ``end`` (:issue:`21161`)
-
- Bug in comparison operations for :class:`MultiIndex` where error was raised on equality / inequality comparison involving a MultiIndex with self.nlevels == 1 (:issue:`21149`)
-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backticks on MultiIndex


I/O
^^^
Expand Down
3 changes: 2 additions & 1 deletion pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,8 @@ def cmp_method(self, other):
if needs_i8_conversion(self) and needs_i8_conversion(other):
return self._evaluate_compare(other, op)

if is_object_dtype(self) and self.nlevels == 1:
from .multi import MultiIndex
if is_object_dtype(self) and not isinstance(self, MultiIndex):
# don't pass MultiIndex
with np.errstate(all='ignore'):
result = ops._comp_method_OBJECT_ARRAY(op, self.values, other)
Expand Down
26 changes: 26 additions & 0 deletions pandas/tests/indexes/test_multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -3291,3 +3291,29 @@ def test_duplicate_multiindex_labels(self):
with pytest.raises(ValueError):
ind.set_levels([['A', 'B', 'A', 'A', 'B'], [2, 1, 3, -2, 5]],
inplace=True)

@pytest.mark.parametrize("midx,idx,count", [
(pd.MultiIndex.from_product([[0, 1], [1, 0]]), pd.Series(range(4)), 4),
(pd.MultiIndex.from_product([[0, 1]]), pd.Series(range(2)), 2)])
def test_multiindex_compare(self, midx, idx, count):
# GH 21149
'''Ensure comparison operations for MultiIndex with nlevels == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use triple-double quotes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

behave consistently with those for MultiIndex with nlevels > 1
'''
expected = pd.Series([True]).repeat(count)
expected.reset_index(drop=True, inplace=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use inplace in tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Thanks - here inplace is being used to only create the expected outcome for testing. inplace is not being used in the test assert_series_equal
OK to keep then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done - I have simplified the tests.
now only kept the required cases - as such have removed the parametrisation and also use of inplace in creating the expected results

# Equality self-test: MultiIndex object vs self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line between cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

result = pd.Series(midx == midx)
tm.assert_series_equal(result, expected)
# Equality self-test: non-MultiIndex Index object vs self
result = (idx == idx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where did the idea for these test come from? what exactly are you trying to test here?

Copy link
Contributor Author

@KalyanGokhale KalyanGokhale May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are trying to ensure the behaviour on comparison between MultiIndex vs MultiIndex, MultiIndex vs Index is consistent, irrespective of nlevels. Also, ensuring that the comparison behaviour for Index vs Index has not changed due to this PR.

Currently (as of 0.23.0), comparing MultiIndex of nlevels==1 with another of same length raises a ValueError e.g.

[In] midx=pd.MultiIndex.from_product([[0, 1]])
[In] midx
[Out] MultiIndex(levels=[[0, 1]],
           labels=[[0, 1]])
[In] midx == midx
[Out] ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

whereas the behaviour should be consistent with that for MultiIndex with nlevels>1 as follows:

[In] midx == midx
[Out] array([ True,  True])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this test, probably should be OK to only keep the midx == midx and midx > midx checks

tm.assert_series_equal(result, expected)

expected = pd.Series([False]).repeat(count)
expected.reset_index(drop=True, inplace=True)
# Greater than comparison: MultiIndex object vs self
result = pd.Series(midx > midx)
tm.assert_series_equal(result, expected)
# Equality test: non-MultiIndex Index object vs MultiIndex object
result = pd.Series(midx == idx)
tm.assert_series_equal(result, expected)