-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
return empty MultiIndex for symmetrical difference on equal MultiIndexes #16486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/tests/indexes/test_base.py
Outdated
@@ -909,6 +907,13 @@ def test_symmetric_difference(self): | |||
expected = MultiIndex.from_tuples([('bar', 2), ('baz', 3), ('bar', 3)]) | |||
assert tm.equalContents(result, expected) | |||
|
|||
# on equal multiIndexs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this to it's own dedicated test, and add a comment with a link to the github issue?
Some of your style changes caused linting errors: https://travis-ci.org/pandas-dev/pandas/jobs/235803966#L1385
|
pandas/core/indexes/base.py
Outdated
if 'freq' in attribs: | ||
attribs['freq'] = None | ||
return self._shallow_copy_with_infer(the_diff, **attribs) | ||
if self.nlevels > 1 and len(the_diff) == 0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could use a comment explaining why we need the special case.
pandas/core/indexes/base.py
Outdated
if self.nlevels > 1 and len(the_diff) == 0: | ||
return type(self)([[] for _ in range(self.nlevels)], | ||
[[] for _ in range(self.nlevels)]) | ||
else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this else
condition, and just dedent everything like it was before.
Codecov Report
@@ Coverage Diff @@
## master #16486 +/- ##
===========================================
- Coverage 90.4% 40.17% -50.24%
===========================================
Files 161 161
Lines 51033 51035 +2
===========================================
- Hits 46136 20501 -25635
- Misses 4897 30534 +25637
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #16486 +/- ##
==========================================
+ Coverage 90.75% 90.75% +<.01%
==========================================
Files 161 161
Lines 51071 51073 +2
==========================================
+ Hits 46348 46351 +3
+ Misses 4723 4722 -1
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couple of changes. pls add a whatsnew entry in Bug Fixes (reshaping) for 0.20.2
pandas/core/indexes/base.py
Outdated
# On equal MultiIndexes the difference is empty. Therefore an empty | ||
# MultiIndex is returned GH13490 | ||
if self.nlevels > 1 and len(the_diff) == 0: | ||
return type(self)([[] for _ in range(self.nlevels)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use self._shallow_copy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That wouldn't work, self._shallow_copy fails if the_diff is empty. That's why I am returning an empty MI instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In [2]: pd.MultiIndex([[], []], [[],[]])
Out[2]:
MultiIndex(levels=[[], []],
labels=[[], []])
In [3]: pd.MultiIndex([[], []], [[],[]])._shallow_copy()
Out[3]:
MultiIndex(levels=[[], []],
labels=[[], []])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I was confusing it with self._shallow_copy_with_infer
. But still, if I call self._shallow_copy
with the_diff being empty from_tuples
gives me TypeError: Cannot infer number of levels from empty list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback what do you think about a special case in MulltiIndex._shallow_copy
when values
is length 0 (that's what is passed from symmetric_difference
)?
@Appender(_index_shared_docs['_shallow_copy'])
def _shallow_copy(self, values=None, **kwargs):
if values is not None:
if 'name' in kwargs:
kwargs['names'] = kwargs.pop('name', None)
# discards freq
kwargs.pop('freq', None)
# this if block is new
if len(values) == 0:
return MultiIndex(levels=[[] for _ in range(self.nlevels)],
labels=[[] for _ in range(self.nlabels)])
return MultiIndex.from_tuples(values, **kwargs)
return self.view()
this would "work" but I don't know if "array of length 0" means same structure, but empty. Maybe it's ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change should be done in _shallow_copy_with_infer
, you need to construct a MI equiv of self[0:0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change should be done in _shallow_copy_with_infer, you need to construct a MI equiv of self[0:0]
This leaves around some levels unfortunately:
In [16]: idx = pd.MultiIndex.from_product([['a', 'b'], ['A', 'B']])
In [17]: idx[0:0]
Out[17]:
MultiIndex(levels=[['a', 'b'], ['A', 'B']],
labels=[[], []])
In [18]: idx.difference(idx) | idx.difference(idx)
Out[18]:
MultiIndex(levels=[[], []],
labels=[[], []])
I believe we want Out[18]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, then I guess you can special case values=None
in _shallow_copy
to take a different path of construction (IOW don't use the MultiIndex.from_tuples). The crucial point is to propogate the meta-data.
pandas/tests/indexes/test_base.py
Outdated
idx2 = MultiIndex.from_tuples(self.tuples) | ||
result = idx1.symmetric_difference(idx2) | ||
expected = MultiIndex(levels=[[], []], labels=[[], []]) | ||
assert tm.equalContents(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use tm.assert_index_equal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would the expected value be None then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why?
tm.assert_index_equal(result, expected)
will work this compares the indexes, including all meta-data.
in fact your example is NOT propogating meta-data (that's what _shallow_copy does and why you need to use it). add some names
for the levels and see.
@Tafkas sorry if the above conversation about if len(values) == 0:
return MultiIndex(levels=[[] for _ in range(self.nlevels)],
labels=[[] for _ in range(self.nlabels)], **kwargs) Can you also add a couple tests with metadata (names)? I think the rule is to propagate names if they are the same on either index. So def test_multiindex_symmetric_difference(self):
idx = MultiIndex.from_product([['a', 'b'], ['A', 'B']],
names=['a', 'b'])
result = idx ^ idx
assert result.names == idx.names
idx2 = idx.copy().rename(['A', 'B'])
result = idx ^ idx
assert result.names == [None, None] Do you have a chance to make those changes? |
@TomAugspurger sure I can do that. But what about the case if values is None? |
Then it should do whatever it's doing now. The new case should only apply when the sym diff is empty. |
@Tafkas if you can rebase / update, we can merge this before 0.20.2 is release (end of the week) |
doc/source/whatsnew/v0.20.2.txt
Outdated
@@ -38,6 +38,7 @@ Bug Fixes | |||
~~~~~~~~~ | |||
|
|||
- Bug in using ``pathlib.Path`` or ``py.path.local`` objects with io functions (:issue:`16291`) | |||
- Bug in ``Index`` calling symmetric_difference() on two equal multiindices results in a TypeError (:issue `13490`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Index.symmetric_difference()
on two equal MultiIndex's, results in a TypeError
pandas/core/indexes/multi.py
Outdated
@@ -419,6 +419,11 @@ def _shallow_copy_with_infer(self, values=None, **kwargs): | |||
@Appender(_index_shared_docs['_shallow_copy']) | |||
def _shallow_copy(self, values=None, **kwargs): | |||
if values is not None: | |||
# On equal MultiIndexes the difference is empty. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this block can simply go in _shallow_copy_with_infer
, idea being we don't pass None
to _shallow_copy
unless we actually mean that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed it
lgtm. ping on green. |
thanks! |
…xes (pandas-dev#16486) (cherry picked from commit d31ffdb)
Version 0.20.2 * tag 'v0.20.2': (68 commits) RLS: v0.20.2 DOC: Update release.rst DOC: Whatsnew fixups (pandas-dev#16596) ERRR: Raise error in usecols when column doesn't exist but length matches (pandas-dev#16460) BUG: convert numpy strings in index names in HDF pandas-dev#13492 (pandas-dev#16444) PERF: vectorize _interp_limit (pandas-dev#16592) DOC: whatsnew 0.20.2 edits (pandas-dev#16587) API: Make is_strictly_monotonic_* private (pandas-dev#16576) BUG: reimplement MultiIndex.remove_unused_levels (pandas-dev#16565) Strictly monotonic (pandas-dev#16555) ENH: add .ngroup() method to groupby objects (pandas-dev#14026) (pandas-dev#14026) fix linting BUG: Incorrect handling of rolling.cov with offset window (pandas-dev#16244) BUG: select_as_multiple doesn't respect start/stop kwargs GH16209 (pandas-dev#16317) return empty MultiIndex for symmetrical difference on equal MultiIndexes (pandas-dev#16486) BUG: Bug in .resample() and .groupby() when aggregating on integers (pandas-dev#16549) BUG: Fixed tput output on windows (pandas-dev#16496) Strictly monotonic (pandas-dev#16555) BUG: fixed wrong order of ordered labels in pd.cut() BUG: Fixed to_html ignoring index_names parameter ...
git diff upstream/master --name-only -- '*.py' | flake8 --diff