-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
BUG: names on union and intersection for Index were inconsistent (GH9943 GH9862) #19849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
aad5884
3ef2a69
ba9ccdd
38aeacb
fb3a9db
f72377a
08eb291
396088f
41cd000
a48f734
b977e52
a030ede
9ba6960
d1620f6
893efb8
fd7f23f
2d1c366
fa7311a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -57,7 +57,7 @@ | |
import pandas.core.algorithms as algos | ||
import pandas.core.sorting as sorting | ||
from pandas.io.formats.printing import pprint_thing | ||
from pandas.core.ops import make_invalid_op | ||
from pandas.core.ops import make_invalid_op, get_op_result_name | ||
from pandas.core.config import get_option | ||
from pandas.core.strings import StringMethods | ||
|
||
|
@@ -1191,7 +1191,7 @@ def _convert_can_do_setop(self, other): | |
other = Index(other, name=self.name) | ||
result_name = self.name | ||
else: | ||
result_name = self.name if self.name == other.name else None | ||
result_name = get_op_result_name(self, other) | ||
return other, result_name | ||
|
||
def _convert_for_op(self, value): | ||
|
@@ -2263,19 +2263,15 @@ def __or__(self, other): | |
def __xor__(self, other): | ||
return self.symmetric_difference(other) | ||
|
||
def _get_consensus_name(self, other): | ||
def _get_setop_name_object(self, other): | ||
""" | ||
Given 2 indexes, give a consensus name meaning | ||
we take the not None one, or None if the names differ. | ||
Return a new object if we are resetting the name | ||
Given 2 indexes, give a setop name and object, meaning | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. instead why can't we just use _wrap_setop_result? instead of having 2 nearly identical funtions There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback I think this is needed (and I'll rename it and document it accordingly) in the case you are computing I'll also check the other There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm, I think we do this elsewhere as well (e.g. when you reindex to yourself). can you add docmentation about this to the setops themselves. @TomAugspurger @jorisvandenbossche what do you think about this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't have a strong preference, but it'd be good to settle on consistent behavior between union and intersection (I think). What's the proposal here?
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @TomAugspurger not what I was talking about, the current logic is correct (and shouldn't change), what you have is right. This was more about whether if we have identical objects we should shallow copy or not. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My bad. I don't have a strong preference, but I would tend towards always shallow copying? I hope people aren't relying on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @TomAugspurger With respect to your naming, that is not what will happen. In the cases of named and unnamed, the result will be
then @jreback I can change to shallow copying, but I'll have to modify some other tests that are depending on the result not being a shallow copy. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah I think its better to _shallow_copy these, yes I understand some tests may change because of this (IOW they will no longer be |
||
we use get_op_result_name to return the name, and then | ||
return a new object if we are resetting the name | ||
""" | ||
if self.name != other.name: | ||
if self.name is None or other.name is None: | ||
name = self.name or other.name | ||
else: | ||
name = None | ||
if self.name != name: | ||
return self._shallow_copy(name=name) | ||
name = get_op_result_name(self, other) | ||
if self.name != name: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think might be able to dispense with this entirely, and just always do
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do this in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So if I eliminate |
||
return self._shallow_copy(name=name) | ||
return self | ||
|
||
def union(self, other): | ||
|
@@ -2303,10 +2299,10 @@ def union(self, other): | |
other = _ensure_index(other) | ||
|
||
if len(other) == 0 or self.equals(other): | ||
return self._get_consensus_name(other) | ||
return self._get_setop_name_object(other) | ||
|
||
if len(self) == 0: | ||
return other._get_consensus_name(self) | ||
return other._get_setop_name_object(self) | ||
|
||
# TODO: is_dtype_union_equal is a hack around | ||
# 1. buggy set ops with duplicates (GH #13432) | ||
|
@@ -2369,11 +2365,10 @@ def union(self, other): | |
stacklevel=3) | ||
|
||
# for subclasses | ||
return self._wrap_union_result(other, result) | ||
return self._wrap_setop_result(other, result) | ||
|
||
def _wrap_union_result(self, other, result): | ||
name = self.name if self.name == other.name else None | ||
return self.__class__(result, name=name) | ||
def _wrap_setop_result(self, other, result): | ||
return self.__class__(result, name=get_op_result_name(self, other)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback I've done some testing with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thats fine There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we consistently use can you use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There is no other usage of We had discussion 4 months ago about why |
||
|
||
def intersection(self, other): | ||
""" | ||
|
@@ -2403,7 +2398,7 @@ def intersection(self, other): | |
other = _ensure_index(other) | ||
|
||
if self.equals(other): | ||
return self._get_consensus_name(other) | ||
return self._get_setop_name_object(other) | ||
|
||
if not is_dtype_equal(self.dtype, other.dtype): | ||
this = self.astype('O') | ||
|
@@ -2423,7 +2418,7 @@ def intersection(self, other): | |
if self.is_monotonic and other.is_monotonic: | ||
try: | ||
result = self._inner_indexer(lvals, rvals)[0] | ||
return self._wrap_union_result(other, result) | ||
return self._wrap_setop_result(other, result) | ||
except TypeError: | ||
pass | ||
|
||
|
@@ -3552,7 +3547,7 @@ def _join_monotonic(self, other, how='left', return_indexers=False): | |
return join_index | ||
|
||
def _wrap_joined_index(self, joined, other): | ||
name = self.name if self.name == other.name else None | ||
name = get_op_result_name(self, other) | ||
return Index(joined, name=name) | ||
|
||
def _get_string_slice(self, key, use_lhs=True, use_rhs=True): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -19,6 +19,7 @@ | |
_ensure_object) | ||
from pandas.core.dtypes.dtypes import PeriodDtype | ||
from pandas.core.dtypes.generic import ABCSeries | ||
from pandas.core.ops import get_op_result_name | ||
|
||
import pandas.tseries.frequencies as frequencies | ||
from pandas.tseries.frequencies import get_freq_code as _gfc | ||
|
@@ -996,8 +997,8 @@ def _assert_can_do_setop(self, other): | |
msg = _DIFFERENT_FREQ_INDEX.format(self.freqstr, other.freqstr) | ||
raise IncompatibleFrequency(msg) | ||
|
||
def _wrap_union_result(self, other, result): | ||
name = self.name if self.name == other.name else None | ||
def _wrap_setop_result(self, other, result): | ||
name = get_op_result_name(self, other) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think could just remove this entirely here if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The reason we can't move it to base from category is because in the case of category, we need to preserve the categories, and in the case of period, we need to preserve frequency. So we could investigate in a separate PR about moving the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah that's fine |
||
result = self._apply_meta(result) | ||
result.name = name | ||
return result | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -26,6 +26,7 @@ | |
from pandas.core.base import _shared_docs | ||
from pandas.core.indexes.base import _index_shared_docs | ||
import pandas.core.common as com | ||
from pandas.core.ops import get_op_result_name | ||
import pandas.core.dtypes.concat as _concat | ||
from pandas.util._decorators import Appender, Substitution, deprecate_kwarg | ||
from pandas.core.indexes.datetimelike import TimelikeOps, DatetimeIndexOpsMixin | ||
|
@@ -577,7 +578,7 @@ def join(self, other, how='left', level=None, return_indexers=False, | |
sort=sort) | ||
|
||
def _wrap_joined_index(self, joined, other): | ||
name = self.name if self.name == other.name else None | ||
name = get_op_result_name(self, other) | ||
if (isinstance(other, TimedeltaIndex) and self.freq == other.freq and | ||
self._can_fast_union(other)): | ||
joined = self._shallow_copy(joined, name=name) | ||
|
@@ -637,8 +638,8 @@ def _fast_union(self, other): | |
else: | ||
return left | ||
|
||
def _wrap_union_result(self, other, result): | ||
name = self.name if self.name == other.name else None | ||
def _wrap_setop_result(self, other, result): | ||
name = get_op_result_name(self, other) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you can remove this one |
||
return self._simple_new(result, name=name, freq=None) | ||
|
||
def intersection(self, other): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -730,7 +730,15 @@ def test_union(self): | |
union = Index([]).union(first) | ||
assert union is first | ||
|
||
# preserve names | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. make new tests |
||
# preserve names only when they are the same | ||
# GH 9943 9862 | ||
|
||
first = Index(list('ab'), name='A') | ||
second = Index(list('abc'), name='A') | ||
union = first.union(second) | ||
expected = Index(list('abc'), name='A') | ||
tm.assert_index_equal(union, expected) | ||
|
||
first = Index(list('ab'), name='A') | ||
second = Index(list('ab'), name='B') | ||
union = first.union(second) | ||
|
@@ -752,37 +760,66 @@ def test_union(self): | |
first = Index(list('ab')) | ||
second = Index(list('ab'), name='B') | ||
union = first.union(second) | ||
expected = Index(list('ab'), name='B') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. make new tests |
||
expected = Index(list('ab'), name=None) | ||
tm.assert_index_equal(union, expected) | ||
|
||
# GH 9943 9862 | ||
first = Index(list('abc')) | ||
second = Index(list('ab'), name='B') | ||
union = first.union(second) | ||
expected = Index(list('abc'), name=None) | ||
tm.assert_index_equal(union, expected) | ||
|
||
first = Index([]) | ||
second = Index(list('ab'), name='B') | ||
union = first.union(second) | ||
expected = Index(list('ab'), name='B') | ||
expected = Index(list('ab'), name=None) | ||
tm.assert_index_equal(union, expected) | ||
|
||
first = Index(list('ab')) | ||
second = Index([], name='B') | ||
union = first.union(second) | ||
expected = Index(list('ab'), name='B') | ||
expected = Index(list('ab'), name=None) | ||
tm.assert_index_equal(union, expected) | ||
|
||
first = Index(list('ab'), name='A') | ||
second = Index(list('ab')) | ||
union = first.union(second) | ||
expected = Index(list('ab'), name='A') | ||
expected = Index(list('ab'), name=None) | ||
tm.assert_index_equal(union, expected) | ||
|
||
# GH 9943 9862 | ||
first = Index(list('ab'), name='A') | ||
second = Index(list('abc')) | ||
union = first.union(second) | ||
expected = Index(list('abc'), name=None) | ||
tm.assert_index_equal(union, expected) | ||
|
||
first = Index(list('ab'), name='A') | ||
second = Index([]) | ||
union = first.union(second) | ||
expected = Index(list('ab'), name='A') | ||
expected = Index(list('ab'), name=None) | ||
tm.assert_index_equal(union, expected) | ||
|
||
first = Index([], name='A') | ||
second = Index(list('ab')) | ||
union = first.union(second) | ||
expected = Index(list('ab'), name='A') | ||
expected = Index(list('ab'), name=None) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jreback Not sure what you mean by "make new tests". The issue with the existing tests is that they were incorrect. So I had to fix them. For example,
is incorrect, as if one index has a name, and the other has no name, the result should be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I meant that these tests are getting very long, so split up a bit. We also need more general testing, all of these tests should be using a generically created index (all of this machinery already exists ia .create_index()) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've split the |
||
tm.assert_index_equal(union, expected) | ||
|
||
# Chained unions handles names correctly | ||
i1 = Index([1, 2], name='i1') | ||
i2 = Index([3, 4], name='i2') | ||
i3 = Index([5, 6], name='i3') | ||
union = i1.union(i2.union(i3)) | ||
expected = i1.union(i2).union(i3) | ||
tm.assert_index_equal(union, expected) | ||
|
||
j1 = Index([1, 2], name='j1') | ||
j2 = Index([], name='j2') | ||
j3 = Index([], name='j3') | ||
union = j1.union(j2.union(j3)) | ||
expected = j1.union(j2).union(j3) | ||
tm.assert_index_equal(union, expected) | ||
|
||
with tm.assert_produces_warning(RuntimeWarning): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the only user facing change? e.g. intersection as well? 'resulting names' is not vey clear, maybe 'the name of the Index of the union'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now changed as you suggested.
intersection
was computing names correctly despite the original example reported in #9943.