Skip to content

CLN: Comparison methods for MultiIndex should have consistent behaviour for all nlevels (GH21149) #21195

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jun 14, 2018

Conversation

KalyanGokhale
Copy link
Contributor

@KalyanGokhale KalyanGokhale commented May 24, 2018

Initial PR #21182 is closed - and now split in 2 different PRs

def cmp_method raised ValueError for equality / inequality comparisons of MultiIndex with nlevels == 1, which was inconsistent with behaviour for MultiIndex with nlevels > 1 (details of issue below) - this has now been fixed

Currently (as of 0.23.0), comparing MultiIndex of nlevels==1 with another of same length raises a ValueError e.g.

[In] midx=pd.MultiIndex.from_product([[0, 1]])
[In] midx
[Out] MultiIndex(levels=[[0, 1]],
           labels=[[0, 1]])
[In] midx == midx
[Out] ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

whereas the behaviour should be consistent with that for MultiIndex with nlevels>1 as follows:

[In] midx == midx
[Out] array([ True,  True])

@KalyanGokhale KalyanGokhale changed the title BUG: Comparison methods for MultiIndex with nlevels == 1 should have consistent behavior (GH21149) BUG: Comparison methods for MultiIndex with nlevels == 1 should have consistent behaviour with MultiIndex having nlevels > 1 (GH21149) May 24, 2018
@KalyanGokhale KalyanGokhale changed the title BUG: Comparison methods for MultiIndex with nlevels == 1 should have consistent behaviour with MultiIndex having nlevels > 1 (GH21149) BUG: Comparison methods for MultiIndex should have consistent behaviour for all nlevels (GH21149) May 24, 2018
Removed the following test, which was causing builds to fail (?). This was working when tested on my command line (Mac OS Terminal)
        # Greater-than test: non-MultiIndex Index object vs MultiIndex object
        with tm.assert_raises_regex(TypeError, 'not supported'):
            midx > idx
@pep8speaks
Copy link

pep8speaks commented May 24, 2018

Hello @KalyanGokhale! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on June 14, 2018 at 10:23 Hours UTC

@codecov
Copy link

codecov bot commented May 24, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@fd121ed). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #21195   +/-   ##
=========================================
  Coverage          ?    91.9%           
=========================================
  Files             ?      153           
  Lines             ?    49607           
  Branches          ?        0           
=========================================
  Hits              ?    45590           
  Misses            ?     4017           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.3% <100%> (?)
#single 41.89% <100%> (?)
Impacted Files Coverage Δ
pandas/core/indexes/base.py 96.62% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update fd121ed...73cac75. Read the comment docs.

@@ -78,7 +78,8 @@ Indexing

- Bug in :meth:`Series.reset_index` where appropriate error was not raised with an invalid level name (:issue:`20925`)
- Bug in :func:`interval_range` when ``start``/``periods`` or ``end``/``periods`` are specified with float ``start`` or ``end`` (:issue:`21161`)
-
- Bug in comparison operations for :class:`MultiIndex` where error was raised on equality / inequality comparison involving a MultiIndex with self.nlevels == 1 (:issue:`21149`)
-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double backticks on MultiIndex

(pd.MultiIndex.from_product([[0, 1]]), pd.Series(range(2)), 2)])
def test_multiindex_compare(self, midx, idx, count):
# GH 21149
'''Ensure comparison operations for MultiIndex with nlevels == 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use triple-double quotes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

'''
expected = pd.Series([True]).repeat(count)
expected.reset_index(drop=True, inplace=True)
# Equality self-test: MultiIndex object vs self
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blank line between cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

behave consistently with those for MultiIndex with nlevels > 1
'''
expected = pd.Series([True]).repeat(count)
expected.reset_index(drop=True, inplace=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use inplace in tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Thanks - here inplace is being used to only create the expected outcome for testing. inplace is not being used in the test assert_series_equal
OK to keep then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done - I have simplified the tests.
now only kept the required cases - as such have removed the parametrisation and also use of inplace in creating the expected results

result = pd.Series(midx == midx)
tm.assert_series_equal(result, expected)
# Equality self-test: non-MultiIndex Index object vs self
result = (idx == idx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where did the idea for these test come from? what exactly are you trying to test here?

Copy link
Contributor Author

@KalyanGokhale KalyanGokhale May 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are trying to ensure the behaviour on comparison between MultiIndex vs MultiIndex, MultiIndex vs Index is consistent, irrespective of nlevels. Also, ensuring that the comparison behaviour for Index vs Index has not changed due to this PR.

Currently (as of 0.23.0), comparing MultiIndex of nlevels==1 with another of same length raises a ValueError e.g.

[In] midx=pd.MultiIndex.from_product([[0, 1]])
[In] midx
[Out] MultiIndex(levels=[[0, 1]],
           labels=[[0, 1]])
[In] midx == midx
[Out] ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

whereas the behaviour should be consistent with that for MultiIndex with nlevels>1 as follows:

[In] midx == midx
[Out] array([ True,  True])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in this test, probably should be OK to only keep the midx == midx and midx > midx checks

@toobaz
Copy link
Member

toobaz commented May 25, 2018

def cmp_method raised ValueError for equality / inequality comparisons of MultiIndex with nlevels == 1

Good catch!

@KalyanGokhale
Copy link
Contributor Author

@jreback @toobaz are any other edits needed on this? Thanks

@KalyanGokhale KalyanGokhale changed the title BUG: Comparison methods for MultiIndex should have consistent behaviour for all nlevels (GH21149) CLN: Comparison methods for MultiIndex should have consistent behaviour for all nlevels (GH21149) Jun 5, 2018
@KalyanGokhale
Copy link
Contributor Author

Any other work needed on this PR? Thanks

@jreback jreback added this to the 0.23.2 milestone Jun 14, 2018
@jreback jreback merged commit a8738ba into pandas-dev:master Jun 14, 2018
@jreback
Copy link
Contributor

jreback commented Jun 14, 2018

thanks, if anything residual is remaining from #21149 let's open a new issue

@KalyanGokhale KalyanGokhale deleted the GH21149-1 branch June 14, 2018 11:39
david-liu-brattle-1 pushed a commit to david-liu-brattle-1/pandas that referenced this pull request Jun 18, 2018
jorisvandenbossche pushed a commit that referenced this pull request Jun 29, 2018
…ur for all nlevels (GH21149) (#21195)

(cherry picked from commit a8738ba)
jorisvandenbossche pushed a commit that referenced this pull request Jul 2, 2018
…ur for all nlevels (GH21149) (#21195)

(cherry picked from commit a8738ba)
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(Assuming we don't want to disallow 1-level MultiIndexes), stop identifying MultiIndex by nlevels == 1
5 participants