-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: names on union and intersection for Index were inconsistent (GH9943 GH9862) #19849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
pandas/core/indexes/base.py
Outdated
name = self.name | ||
return name | ||
|
||
def _get_intersection_name(self, other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docstring
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -796,6 +796,7 @@ Indexing | |||
- Bug in :func:`IntervalIndex.symmetric_difference` where the symmetric difference with a non-``IntervalIndex`` did not raise (:issue:`18475`) | |||
- Bug in :class:`IntervalIndex` where set operations that returned an empty ``IntervalIndex`` had the wrong dtype (:issue:`19101`) | |||
- Bug in :meth:`DataFrame.drop_duplicates` where no ``KeyError`` is raised when passing in columns that don't exist on the ``DataFrame`` (issue:`19726`) | |||
- Bug in :func:`Index.union` where resulting names were not computed correrctly (:issue:`9943`, :issue:`9862`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correrctly
-> correctly
@gfyoung Re your question above about closing #9862, that was my sense based on the comments and my analysis of the open checkmarks at the top. BTW, I'm working on those bugs identified in testing. Seems like I didn't have 'numexpr' installed on my local machine so those errors weren't picked up when I ran pytest locally. |
Codecov Report
@@ Coverage Diff @@
## master #19849 +/- ##
=========================================
Coverage ? 92.24%
=========================================
Files ? 161
Lines ? 51193
Branches ? 0
=========================================
Hits ? 47225
Misses ? 3968
Partials ? 0
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so rather than add all of these tests here, they should go in pandas/tests/indexes/test_base.py
which generically tests all index types. further should move the existing setops tests there as well (could be a pre- or post- PR)
pandas/core/indexes/base.py
Outdated
name = None | ||
if self.name != name: | ||
return self._shallow_copy(name=name) | ||
return self.name if self.name == other.name else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this logic already exists in pandas.core.ops.get_op_result_name
I would just use this as a function and not as a method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A function added just a few days ago! Will make that change.
return a new object if we are resetting the name | ||
""" | ||
name = self._get_setop_name(other) | ||
if self.name != name: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think might be able to dispense with this entirely, and just always do
from pandas.core.ops import get_op_result_name
....
return self._shallow_copy(name=get_op_result_name(self, other))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do this in _wrap_setup_result
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I eliminate self.get_setop_name_object
and use _wrap_setop_result
, other tests fail, because those tests assume the corner cases return the same object.
pandas/tests/indexes/test_base.py
Outdated
@@ -729,7 +729,15 @@ def test_union(self): | |||
union = Index([]).union(first) | |||
assert union is first | |||
|
|||
# preserve names |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make new tests
pandas/tests/indexes/test_base.py
Outdated
@@ -751,37 +759,66 @@ def test_union(self): | |||
first = Index(list('ab')) | |||
second = Index(list('ab'), name='B') | |||
union = first.union(second) | |||
expected = Index(list('ab'), name='B') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make new tests
pandas/tests/indexes/test_base.py
Outdated
tm.assert_index_equal(union, expected) | ||
|
||
first = Index([], name='A') | ||
second = Index(list('ab')) | ||
union = first.union(second) | ||
expected = Index(list('ab'), name='A') | ||
expected = Index(list('ab'), name=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback Not sure what you mean by "make new tests". The issue with the existing tests is that they were incorrect. So I had to fix them. For example,
first = Index([], name='A')
second = Index(list('ab'))
union = first.union(second)
expected = Index(list('ab'), name='A')
is incorrect, as if one index has a name, and the other has no name, the result should be None
, based on your comment #9943 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that these tests are getting very long, so split up a bit.
We also need more general testing, all of these tests should be using a generically created index (all of this machinery already exists ia .create_index())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've split the test_union
test, and now have the corner cases tested for all indexes
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -837,6 +837,7 @@ Indexing | |||
- Bug in :func:`IntervalIndex.symmetric_difference` where the symmetric difference with a non-``IntervalIndex`` did not raise (:issue:`18475`) | |||
- Bug in :class:`IntervalIndex` where set operations that returned an empty ``IntervalIndex`` had the wrong dtype (:issue:`19101`) | |||
- Bug in :meth:`DataFrame.drop_duplicates` where no ``KeyError`` is raised when passing in columns that don't exist on the ``DataFrame`` (issue:`19726`) | |||
- Bug in :func:`Index.union` where resulting names were not computed correctly for certain cases (:issue:`9943`, :issue:`9862`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the only user facing change? e.g. intersection as well? 'resulting names' is not vey clear, maybe 'the name of the Index of the union'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now changed as you suggested. intersection
was computing names correctly despite the original example reported in #9943.
pandas/core/indexes/base.py
Outdated
Given 2 indexes, give a consensus name meaning | ||
we take the not None one, or None if the names differ. | ||
Return a new object if we are resetting the name | ||
Given 2 indexes, give a setop name and object, meaning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead why can't we just use _wrap_setop_result? instead of having 2 nearly identical funtions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback I think this is needed (and I'll rename it and document it accordingly) in the case you are computing i1.union(i2)
, where i1.equals(i2)==True
. In this case, you don't want to use _shallow_copy()
because you want to return i1
, unless the names of i1
and i2
are different. There are some tests for joins where if you are doing a join and the index is the same, then you can just return the index, rather than returning a brand new index, which is what _shallow_copy()
would do.
I'll also check the other Index
subclasses to see if they are doing this right. For example, for RangeIndex
, it is not handling the names correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, I think we do this elsewhere as well (e.g. when you reindex to yourself). can you add docmentation about this to the setops themselves. @TomAugspurger @jorisvandenbossche what do you think about this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong preference, but it'd be good to settle on consistent behavior between union and intersection (I think). What's the proposal here?
- named & same name -> name
- named & other name -> None?
- named & unnamed -> name?
- unnamed & named -> name?
- unamed & unnamed -> None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger not what I was talking about, the current logic is correct (and shouldn't change), what you have is right. This was more about whether if we have identical objects we should shallow copy or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad.
I don't have a strong preference, but I would tend towards always shallow copying? I hope people aren't relying on (idx | idx) is idx
being true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger With respect to your naming, that is not what will happen. In the cases of named and unnamed, the result will be None
. That's because if you do
j1 = pd.Index([1,2], name='j1')
j2 = pd.Index([], name='j2')
j3 = pd.Index([], name='j3')
then j1.union(j2).union(j3) returns Int64Index([1, 2], dtype='int64', name='j3')
, while just changing the order to j3.union(j1).union(j2)
returns Int64Index([1, 2], dtype='int64', name='j2')
. So the behavior will be to only preserve the names if they are equal, and the result of these two union examples will have None
as the name.
@jreback I can change to shallow copying, but I'll have to modify some other tests that are depending on the result not being a shallow copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think its better to _shallow_copy these, yes I understand some tests may change because of this (IOW they will no longer be is
, just .equals
)
pandas/core/indexes/timedeltas.py
Outdated
def _wrap_union_result(self, other, result): | ||
name = self.name if self.name == other.name else None | ||
def _wrap_setop_result(self, other, result): | ||
name = get_op_result_name(self, other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can remove this one
pandas/core/indexes/base.py
Outdated
name = self.name if self.name == other.name else None | ||
return self.__class__(result, name=name) | ||
def _wrap_setop_result(self, other, result): | ||
return self.__class__(result, name=get_op_result_name(self, other)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use self._shallow_copy
this propagates meta-data, allows you to remove some sublcass implementations of this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback I've done some testing with self.shallow_copy()
and it broke things with respect to CategoricalIndex
, and then the discussion in #10186 becomes relevant, as we'd need to decide what to do when taking the union of two categorical indexes that have different categories. So I'm going to leave this implementation as it was.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thats fine
pandas/tests/indexes/test_base.py
Outdated
tm.assert_index_equal(union, expected) | ||
|
||
first = Index([], name='A') | ||
second = Index(list('ab')) | ||
union = first.union(second) | ||
expected = Index(list('ab'), name='A') | ||
expected = Index(list('ab'), name=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant that these tests are getting very long, so split up a bit.
We also need more general testing, all of these tests should be using a generically created index (all of this machinery already exists ia .create_index())
…riety of index types.
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -837,6 +837,8 @@ Indexing | |||
- Bug in :func:`IntervalIndex.symmetric_difference` where the symmetric difference with a non-``IntervalIndex`` did not raise (:issue:`18475`) | |||
- Bug in :class:`IntervalIndex` where set operations that returned an empty ``IntervalIndex`` had the wrong dtype (:issue:`19101`) | |||
- Bug in :meth:`DataFrame.drop_duplicates` where no ``KeyError`` is raised when passing in columns that don't exist on the ``DataFrame`` (issue:`19726`) | |||
- Bug in :func:`Index.union` and :func:`Index.intersection` where name of the `Index` of the result was not computed correctly for certain cases (:issue:`9943`, :issue:`9862`) | |||
- Bug in :func:`Index.difference` when taking difference of set and itself as type was not preserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add this PR number is the issue, e.g. (:issue:`19849`)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should "set" be "index"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made both of those changes.
pandas/core/indexes/base.py
Outdated
@@ -2441,6 +2454,13 @@ def intersection(self, other): | |||
taken.name = None | |||
return taken | |||
|
|||
def _create_empty_index(self, name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't create new API, this is just self._shallow_copy([], name=name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback So the issue here is that self._shallow_copy([], name=name)
doesn't work correctly for RangeIndex
. I can modify RangeIndex._self_shallow_copy
to allow []
as an argument. For an index of strings (objects) or an index of dtype int64
, self._shallow_copy([], name=name)
doesn't preserve the dtype
. And I haven't even looked at all the types yet. So now the changes get more involved to handle this corner case of creating an empty index, which is why I created the create_empty_index
. The idea here is to have an empty index that preserves all the other attributes (dtype, categories, range step, etc.) of the class of self
. self._shallow_copy([], name=name)
does not do that without a lot of modifications. If you want me to do those modifications, I can look into that, but they might be pretty substantial.
pandas/core/indexes/category.py
Outdated
@@ -716,6 +723,13 @@ def insert(self, loc, item): | |||
codes = np.concatenate((codes[:loc], code, codes[loc:])) | |||
return self._create_from_codes(codes) | |||
|
|||
def _create_empty_index(self, name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls try for minimal changes.
pandas/core/indexes/base.py
Outdated
return self | ||
|
||
def _union_corner_case(self, other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a reason you are creatijng this as a separate method? why not inline it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for inlining it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback @TomAugspurger The subclasses for DateTimeIndex
, RangeIndex
and TimeDeltaIndex
need to call this method from their union
implementations. Should I inline it there too and have repeated code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok this PR looks a lot better. I still don't like this hlper function. can we refactor to not use it
pandas/core/indexes/multi.py
Outdated
@@ -2733,7 +2733,7 @@ def intersection(self, other): | |||
other_tuples = other._ndarray_values | |||
uniq_tuples = sorted(set(self_tuples) & set(other_tuples)) | |||
if len(uniq_tuples) == 0: | |||
return MultiIndex(levels=[[]] * self.nlevels, | |||
return MultiIndex(levels=self.levels, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you changing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback IMHO, if the resulting intersection is empty, the levels should be preserved. Here's a simple example:
mi = pd.MultiIndex.from_product([['a','b','c'],[1,2]], names=['ll','nn'])
print(mi.drop(mi))
print(mi.intersection([]))
This produces (with v0.22.0):
MultiIndex(levels=[['a', 'b', 'c'], [1, 2]],
labels=[[], []],
names=['ll', 'nn'])
MultiIndex(levels=[[], []],
labels=[[], []],
names=['ll', 'nn'])
Doing this change as I documented it makes the set algebra consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm finding this all a bit hard to follow as well, since you have to hack around other issues in pandas.
FWIW, fixing #10186 is on my medium-term TODO list, but probably not for 0.23.0. If that was solved, how much would the changes here be simplified? What other inconsistencies would have to be worked around?
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -837,6 +837,8 @@ Indexing | |||
- Bug in :func:`IntervalIndex.symmetric_difference` where the symmetric difference with a non-``IntervalIndex`` did not raise (:issue:`18475`) | |||
- Bug in :class:`IntervalIndex` where set operations that returned an empty ``IntervalIndex`` had the wrong dtype (:issue:`19101`) | |||
- Bug in :meth:`DataFrame.drop_duplicates` where no ``KeyError`` is raised when passing in columns that don't exist on the ``DataFrame`` (issue:`19726`) | |||
- Bug in :func:`Index.union` and :func:`Index.intersection` where name of the `Index` of the result was not computed correctly for certain cases (:issue:`9943`, :issue:`9862`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
double-tick Index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/core/indexes/base.py
Outdated
return self | ||
|
||
def _union_corner_case(self, other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for inlining it.
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -837,6 +837,8 @@ Indexing | |||
- Bug in :func:`IntervalIndex.symmetric_difference` where the symmetric difference with a non-``IntervalIndex`` did not raise (:issue:`18475`) | |||
- Bug in :class:`IntervalIndex` where set operations that returned an empty ``IntervalIndex`` had the wrong dtype (:issue:`19101`) | |||
- Bug in :meth:`DataFrame.drop_duplicates` where no ``KeyError`` is raised when passing in columns that don't exist on the ``DataFrame`` (issue:`19726`) | |||
- Bug in :func:`Index.union` and :func:`Index.intersection` where name of the `Index` of the result was not computed correctly for certain cases (:issue:`9943`, :issue:`9862`) | |||
- Bug in :func:`Index.difference` when taking difference of set and itself as type was not preserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should "set" be "index"?
@TomAugspurger With respect to your question about #10186, the issue comes down to whether we use |
@Dr-Irv as I said above. this needs to be greatly downscoped. You are changing way too many things and not using the current code conventions. So pls break this in to separate PR's. |
@jreback wrote
My latest commit (fb3a9db) removes the changes relative to the bug in |
pandas/tests/indexes/test_base.py
Outdated
@@ -707,6 +707,98 @@ def test_intersect_str_dates(self): | |||
|
|||
assert len(res) == 0 | |||
|
|||
def create_empty_index(self, id): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need a way to have an empty index that is independent of the individual Index
class that is tested in test_corner_union
. Alternatively, I could just call id.drop(id)
each time I need to create an empty index for the test. IMHO (and I can be convinced otherwise), having the calls to _create_empty_index
in the test makes it clear that an empty index is being used in the test.
# GH 9943 9862 | ||
# Test unions with various name combinations | ||
# Do not test MultiIndex | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls use parameterize
pandas/core/indexes/base.py
Outdated
return self | ||
|
||
def _union_corner_case(self, other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok this PR looks a lot better. I still don't like this hlper function. can we refactor to not use it
@jreback with respect to the helper function |
@@ -300,6 +301,12 @@ def itemsize(self): | |||
# Size of the items in categories, not codes. | |||
return self.values.itemsize | |||
|
|||
def _wrap_setop_result(self, other, result): | |||
name = get_op_result_name(self, other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use _shallow_copy here, we never call _simple_new directly. (another reason why the super method should use _shallow_copy
, but that's for new PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you move this method to Index.base? does it break anything?
@@ -1136,6 +1137,11 @@ def union(self, other): | |||
y : Index or DatetimeIndex | |||
""" | |||
self._assert_can_do_setop(other) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you just call the super .union()
here (after any conversion isneeded)? then you can inline the union_corner bizness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/core/indexes/range.py
Outdated
return self | ||
if len(self) == 0: | ||
return other | ||
is_corner_case, corner_result = self._union_corner_case(other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
pandas/core/indexes/timedeltas.py
Outdated
@@ -545,6 +546,11 @@ def union(self, other): | |||
y : Index or TimedeltaIndex | |||
""" | |||
self._assert_can_do_setop(other) | |||
|
|||
is_corner_case, corner_result = self._union_corner_case(other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and here
pandas/tests/indexes/test_base.py
Outdated
|
||
skip_index_keys = ['tuples', 'repeats'] | ||
for key, id in self.indices.items(): | ||
if key not in skip_index_keys: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use continue
here
pandas/tests/indexes/test_base.py
Outdated
('copy', 'A', 'empty', 'B', None), | ||
('copy', 'A', 'empty', None, None), | ||
('copy', None, 'copy', 'B', None), | ||
('copy', None, 'copy', None, None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do the empties in a separate test from the copies, ok on parameterize the names though (on both tests)
name = self.name if self.name == other.name else None | ||
return self.__class__(result, name=name) | ||
def _wrap_setop_result(self, other, result): | ||
return self._constructor(result, name=get_op_result_name(self, other)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would shallowcopy work here? Constructor isn’t used very often in the indexes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel repeating (part of) reply from a few months ago:
the issue comes down to whether we use
self._shallow_copy()
to return the result of set operations. When I tried usingself._shallow_copy()
inIndex._wrap_setop_result()
, and looked at what was happening, thenCategoricalIndex
was the problem. So it was easier to leave the existing implementation ofIndex._wrap_setop_result
. If we were to handle #10186, then I could experiment more with changing the implementation ofIndex._wrap_setop_result
.
This looks like a nice cleanup, thanks @Dr-Irv. |
@Dr-Irv this got a bit lost, can you rebase |
Hello @Dr-Irv! Thanks for updating the PR.
|
@jreback This is ready. The failure was due to a timeout on travis. If you want me to do something about that, let me know. |
Merged master with #23441. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. questions on some tests changes that were needed.
pandas/tests/indexes/test_base.py
Outdated
@@ -715,7 +715,10 @@ def test_empty_fancy_raises(self, attr): | |||
# FutureWarning from non-tuple sequence of nd indexing | |||
@pytest.mark.filterwarnings("ignore::FutureWarning") | |||
def test_getitem_error(self, indices, itm): | |||
with pytest.raises(IndexError): | |||
error_type = IndexError | |||
if isinstance(indices, RangeIndex) and (itm == 'no_int'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
grr, why did this need changing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the changes I did to testing was to add RangeIndex
to the index types used for all of the base index tests. See the change in pandas/tests/indexes/conftest.py . When I did that, it revealed some differences in how RangeIndex
works versus the other index types. For the above test, when you pass a string to a RangeIndex
, pandas returns a ValueError
as opposed to an IndexError
, so the change was necessary.
Alternatively, we could change the behavior of RangeIndex
to return IndexError
when a string is passed to get_item
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this fix is straightforward, pls let's just do this here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one was straightforward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great
pandas/tests/indexes/test_base.py
Outdated
@@ -2517,6 +2581,10 @@ def test_generated_op_names(opname, indices): | |||
# pd.Index.__rsub__ does not exist; though the method does exist | |||
# for subclasses. see GH#19723 | |||
return | |||
if (isinstance(index, RangeIndex) and | |||
opname in ['add', 'radd', 'sub', 'rsub', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above. Adding RangeIndex
to pandas/tests/indexes/conftest.py revealed that the implementation of the operators for RangeIndex
is done differently. Namely, for RangeIndex
, the op methods are bound to one function with a parameter that includes the opname, but for the other index types, there are direct methods for each of the operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above. would like to fix rather than special case a test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was a little tricky, but I think I figured it out right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you can fix those would be great. ping on green.
pandas/tests/indexes/test_base.py
Outdated
@@ -715,7 +715,10 @@ def test_empty_fancy_raises(self, attr): | |||
# FutureWarning from non-tuple sequence of nd indexing | |||
@pytest.mark.filterwarnings("ignore::FutureWarning") | |||
def test_getitem_error(self, indices, itm): | |||
with pytest.raises(IndexError): | |||
error_type = IndexError | |||
if isinstance(indices, RangeIndex) and (itm == 'no_int'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this fix is straightforward, pls let's just do this here
pandas/tests/indexes/test_base.py
Outdated
@@ -2517,6 +2581,10 @@ def test_generated_op_names(opname, indices): | |||
# pd.Index.__rsub__ does not exist; though the method does exist | |||
# for subclasses. see GH#19723 | |||
return | |||
if (isinstance(index, RangeIndex) and | |||
opname in ['add', 'radd', 'sub', 'rsub', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above. would like to fix rather than special case a test.
lgtm. @Dr-Irv ping on green. |
@jreback all green |
@gfyoung Back when I created the PR, I thought this closed #9943 because #9943 was split into #9862 and #9885 . But then I see that @jreback edited the original description to xref #9943. #9943 and #9862 cross reference each other, so I thought putting both refs in the whatsnew is what made sense. But I'll do whatever you guys think is best. |
Thanks @Dr-Irv ! |
xref #9943
closes #9862
index.intersection(empty_index)==index.difference(index)
git diff upstream/master -u -- "*.py" | flake8 --diff
Also fixes bug in
Index.difference
when computingindex.difference(index)