Skip to content

BUG: Fix bug with symmetric difference of two equal MultiIndexes GH12490 #13504

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

BUG: Fix bug with symmetric difference of two equal MultiIndexes GH12490 #13504

wants to merge 1 commit into from

Conversation

lstout
Copy link

@lstout lstout commented Jun 23, 2016

Fixes a bug where the symmetric difference of two equal MultiIndexes would raise a TypeError. MultiIndex used to use the Index.symmetric_difference. With this PR it it's own implementation that is more like MultiIndex.difference.

@codecov-io
Copy link

codecov-io commented Jun 23, 2016

Current coverage is 84.34%

Merging #13504 into master will increase coverage by <.01%

@@             master     #13504   diff @@
==========================================
  Files           138        138          
  Lines         51107      51122    +15   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43103      43117    +14   
- Misses         8004       8005     +1   
  Partials          0          0          

Powered by Codecov. Last updated by bf4786a...8cb445f

result_name = result_name_update

if self.equals(other):
return MultiIndex(levels=[[]] * self.nlevels,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make a constructor that creates an empty MultiIndex._create_as_empty() and just call here for both of these

@jreback jreback added Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels Jun 23, 2016
@@ -527,3 +527,5 @@ Bug Fixes


- Bug in ``Categorical.remove_unused_categories()`` changes ``.codes`` dtype to platform int (:issue:`13261`)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put your whatsnew higher up (in an empty space in Bug Fixes) this way you dont have conflicts

@lstout
Copy link
Author

lstout commented Jun 24, 2016

If Travis passes again I think I got all your comments fixed

  • Move whatsnew
  • Move tests to test_multi.py
  • Constructor for empty MultiIndex (MultiIndex._create_as_empty(nlevels))

return MultiIndex(levels=[[]] * nlevels,
labels=[[]] * nlevels,
names=names, verify_integrity=verify_integrity)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that this is internal. add a Parameters section

return self._create_as_empty(names=result_name)

difference = sorted(set((self.difference(other)).
union(other.difference(self))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment:

This sorting breaks with mixed-int values (same issue as in #13432). I suppose this line may be safely replaced with the code from PR #13514, slightly changed, namely:

        this = self._get_unique_index()
        other = other._get_unique_index()
        indexer = this.get_indexer(other)

        # {this} minus {other}
        common_indexer = indexer.take((indexer != -1).nonzero()[0])
        left_indexer = np.setdiff1d(np.arange(this.size), common_indexer,
                                    assume_unique=True)
        left_diff = this.values.take(left_indexer)

        # {other} minus {this}
        right_indexer = (indexer == -1).nonzero()[0]
        right_diff = other.values.take(right_indexer)

        the_diff = _concat._concat_compat([left_diff, right_diff])

Then if the_diff is empty, an empty MI should be returned, otherwise:

        return self._shallow_copy(the_diff, names=result_name).sortlevel()

(In comparison to Index.symmetric_difference sorting is postponed until the end.)

Similar change may be applied to MultiIndex.difference, I guess.

@jreback
Copy link
Contributor

jreback commented Sep 9, 2016

can you rebase / update?

@jreback
Copy link
Contributor

jreback commented Nov 16, 2016

closing, but pls reopen / comment if you can continue

@jreback jreback closed this Nov 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Symmetric difference on equal MultiIndexes raises TypeError
4 participants