Skip to content

BUG: names on union and intersection for Index were inconsistent (GH9943 GH9862) #19849

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Nov 6, 2018

Conversation

Dr-Irv
Copy link
Contributor

@Dr-Irv Dr-Irv commented Feb 22, 2018

xref #9943
closes #9862

  • tests added / passed
    • Added tests to check names on unions when Index has different values
    • Fixed test for ranges with unions
    • Added tests for all index types for corner cases for union, difference
    • Added tests so that index.intersection(empty_index)==index.difference(index)
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

Also fixes bug in Index.difference when computing index.difference(index)

@gfyoung gfyoung added Bug Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Feb 23, 2018
@gfyoung
Copy link
Member

gfyoung commented Feb 23, 2018

@Dr-Irv @jreback : Are you sure we can close #9862 with this PR? Seems to be the case that it can given some of the discussion that took place. However, just want to double-check.

name = self.name
return name

def _get_intersection_name(self, other):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring

@@ -796,6 +796,7 @@ Indexing
- Bug in :func:`IntervalIndex.symmetric_difference` where the symmetric difference with a non-``IntervalIndex`` did not raise (:issue:`18475`)
- Bug in :class:`IntervalIndex` where set operations that returned an empty ``IntervalIndex`` had the wrong dtype (:issue:`19101`)
- Bug in :meth:`DataFrame.drop_duplicates` where no ``KeyError`` is raised when passing in columns that don't exist on the ``DataFrame`` (issue:`19726`)
- Bug in :func:`Index.union` where resulting names were not computed correrctly (:issue:`9943`, :issue:`9862`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correrctly -> correctly

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Feb 23, 2018

@gfyoung Re your question above about closing #9862, that was my sense based on the comments and my analysis of the open checkmarks at the top.

BTW, I'm working on those bugs identified in testing. Seems like I didn't have 'numexpr' installed on my local machine so those errors weren't picked up when I ran pytest locally.

@codecov
Copy link

codecov bot commented Feb 24, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@6a5c34c). Click here to learn what that means.
The diff coverage is 96.15%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #19849   +/-   ##
=========================================
  Coverage          ?   92.24%           
=========================================
  Files             ?      161           
  Lines             ?    51193           
  Branches          ?        0           
=========================================
  Hits              ?    47225           
  Misses            ?     3968           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.63% <96.15%> (?)
#single 42.29% <59.61%> (?)
Impacted Files Coverage Δ
pandas/core/indexes/category.py 97.87% <100%> (ø)
pandas/core/indexes/period.py 92.17% <100%> (ø)
pandas/core/indexes/interval.py 94.66% <100%> (ø)
pandas/core/indexes/numeric.py 97.32% <100%> (ø)
pandas/core/indexes/base.py 96.45% <100%> (ø)
pandas/core/indexes/range.py 97.32% <100%> (ø)
pandas/core/indexes/timedeltas.py 89.88% <83.33%> (ø)
pandas/core/indexes/datetimes.py 96.13% <88.88%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6a5c34c...fa7311a. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so rather than add all of these tests here, they should go in pandas/tests/indexes/test_base.py which generically tests all index types. further should move the existing setops tests there as well (could be a pre- or post- PR)

name = None
if self.name != name:
return self._shallow_copy(name=name)
return self.name if self.name == other.name else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this logic already exists in pandas.core.ops.get_op_result_name I would just use this as a function and not as a method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A function added just a few days ago! Will make that change.

return a new object if we are resetting the name
"""
name = self._get_setop_name(other)
if self.name != name:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think might be able to dispense with this entirely, and just always do

from pandas.core.ops import get_op_result_name

....

return self._shallow_copy(name=get_op_result_name(self, other))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do this in _wrap_setup_result

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I eliminate self.get_setop_name_object and use _wrap_setop_result, other tests fail, because those tests assume the corner cases return the same object.

@@ -729,7 +729,15 @@ def test_union(self):
union = Index([]).union(first)
assert union is first

# preserve names
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make new tests

@@ -751,37 +759,66 @@ def test_union(self):
first = Index(list('ab'))
second = Index(list('ab'), name='B')
union = first.union(second)
expected = Index(list('ab'), name='B')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make new tests

tm.assert_index_equal(union, expected)

first = Index([], name='A')
second = Index(list('ab'))
union = first.union(second)
expected = Index(list('ab'), name='A')
expected = Index(list('ab'), name=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Not sure what you mean by "make new tests". The issue with the existing tests is that they were incorrect. So I had to fix them. For example,

first = Index([], name='A')
second = Index(list('ab'))
union = first.union(second)
expected = Index(list('ab'), name='A')

is incorrect, as if one index has a name, and the other has no name, the result should be None, based on your comment #9943 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that these tests are getting very long, so split up a bit.

We also need more general testing, all of these tests should be using a generically created index (all of this machinery already exists ia .create_index())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've split the test_union test, and now have the corner cases tested for all indexes

@@ -837,6 +837,7 @@ Indexing
- Bug in :func:`IntervalIndex.symmetric_difference` where the symmetric difference with a non-``IntervalIndex`` did not raise (:issue:`18475`)
- Bug in :class:`IntervalIndex` where set operations that returned an empty ``IntervalIndex`` had the wrong dtype (:issue:`19101`)
- Bug in :meth:`DataFrame.drop_duplicates` where no ``KeyError`` is raised when passing in columns that don't exist on the ``DataFrame`` (issue:`19726`)
- Bug in :func:`Index.union` where resulting names were not computed correctly for certain cases (:issue:`9943`, :issue:`9862`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the only user facing change? e.g. intersection as well? 'resulting names' is not vey clear, maybe 'the name of the Index of the union'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now changed as you suggested. intersection was computing names correctly despite the original example reported in #9943.

Given 2 indexes, give a consensus name meaning
we take the not None one, or None if the names differ.
Return a new object if we are resetting the name
Given 2 indexes, give a setop name and object, meaning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead why can't we just use _wrap_setop_result? instead of having 2 nearly identical funtions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I think this is needed (and I'll rename it and document it accordingly) in the case you are computing i1.union(i2), where i1.equals(i2)==True. In this case, you don't want to use _shallow_copy() because you want to return i1, unless the names of i1 and i2 are different. There are some tests for joins where if you are doing a join and the index is the same, then you can just return the index, rather than returning a brand new index, which is what _shallow_copy() would do.

I'll also check the other Index subclasses to see if they are doing this right. For example, for RangeIndex, it is not handling the names correctly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, I think we do this elsewhere as well (e.g. when you reindex to yourself). can you add docmentation about this to the setops themselves. @TomAugspurger @jorisvandenbossche what do you think about this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong preference, but it'd be good to settle on consistent behavior between union and intersection (I think). What's the proposal here?

  1. named & same name -> name
  2. named & other name -> None?
  3. named & unnamed -> name?
  4. unnamed & named -> name?
  5. unamed & unnamed -> None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger not what I was talking about, the current logic is correct (and shouldn't change), what you have is right. This was more about whether if we have identical objects we should shallow copy or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad.

I don't have a strong preference, but I would tend towards always shallow copying? I hope people aren't relying on (idx | idx) is idx being true.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomAugspurger With respect to your naming, that is not what will happen. In the cases of named and unnamed, the result will be None. That's because if you do

j1 = pd.Index([1,2], name='j1')
j2 = pd.Index([], name='j2')
j3 = pd.Index([], name='j3')

then j1.union(j2).union(j3) returns Int64Index([1, 2], dtype='int64', name='j3'), while just changing the order to j3.union(j1).union(j2) returns Int64Index([1, 2], dtype='int64', name='j2'). So the behavior will be to only preserve the names if they are equal, and the result of these two union examples will have None as the name.

@jreback I can change to shallow copying, but I'll have to modify some other tests that are depending on the result not being a shallow copy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think its better to _shallow_copy these, yes I understand some tests may change because of this (IOW they will no longer be is, just .equals)

def _wrap_union_result(self, other, result):
name = self.name if self.name == other.name else None
def _wrap_setop_result(self, other, result):
name = get_op_result_name(self, other)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove this one

name = self.name if self.name == other.name else None
return self.__class__(result, name=name)
def _wrap_setop_result(self, other, result):
return self.__class__(result, name=get_op_result_name(self, other))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use self._shallow_copy this propagates meta-data, allows you to remove some sublcass implementations of this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback I've done some testing with self.shallow_copy() and it broke things with respect to CategoricalIndex, and then the discussion in #10186 becomes relevant, as we'd need to decide what to do when taking the union of two categorical indexes that have different categories. So I'm going to leave this implementation as it was.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats fine

tm.assert_index_equal(union, expected)

first = Index([], name='A')
second = Index(list('ab'))
union = first.union(second)
expected = Index(list('ab'), name='A')
expected = Index(list('ab'), name=None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant that these tests are getting very long, so split up a bit.

We also need more general testing, all of these tests should be using a generically created index (all of this machinery already exists ia .create_index())

@@ -837,6 +837,8 @@ Indexing
- Bug in :func:`IntervalIndex.symmetric_difference` where the symmetric difference with a non-``IntervalIndex`` did not raise (:issue:`18475`)
- Bug in :class:`IntervalIndex` where set operations that returned an empty ``IntervalIndex`` had the wrong dtype (:issue:`19101`)
- Bug in :meth:`DataFrame.drop_duplicates` where no ``KeyError`` is raised when passing in columns that don't exist on the ``DataFrame`` (issue:`19726`)
- Bug in :func:`Index.union` and :func:`Index.intersection` where name of the `Index` of the result was not computed correctly for certain cases (:issue:`9943`, :issue:`9862`)
- Bug in :func:`Index.difference` when taking difference of set and itself as type was not preserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add this PR number is the issue, e.g. (:issue:`19849`)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "set" be "index"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made both of those changes.

@@ -2441,6 +2454,13 @@ def intersection(self, other):
taken.name = None
return taken

def _create_empty_index(self, name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't create new API, this is just self._shallow_copy([], name=name)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback So the issue here is that self._shallow_copy([], name=name) doesn't work correctly for RangeIndex. I can modify RangeIndex._self_shallow_copy to allow [] as an argument. For an index of strings (objects) or an index of dtype int64, self._shallow_copy([], name=name) doesn't preserve the dtype. And I haven't even looked at all the types yet. So now the changes get more involved to handle this corner case of creating an empty index, which is why I created the create_empty_index. The idea here is to have an empty index that preserves all the other attributes (dtype, categories, range step, etc.) of the class of self. self._shallow_copy([], name=name) does not do that without a lot of modifications. If you want me to do those modifications, I can look into that, but they might be pretty substantial.

@@ -716,6 +723,13 @@ def insert(self, loc, item):
codes = np.concatenate((codes[:loc], code, codes[loc:]))
return self._create_from_codes(codes)

def _create_empty_index(self, name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls try for minimal changes.

return self

def _union_corner_case(self, other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason you are creatijng this as a separate method? why not inline it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for inlining it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback @TomAugspurger The subclasses for DateTimeIndex, RangeIndex and TimeDeltaIndex need to call this method from their union implementations. Should I inline it there too and have repeated code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this PR looks a lot better. I still don't like this hlper function. can we refactor to not use it

@@ -2733,7 +2733,7 @@ def intersection(self, other):
other_tuples = other._ndarray_values
uniq_tuples = sorted(set(self_tuples) & set(other_tuples))
if len(uniq_tuples) == 0:
return MultiIndex(levels=[[]] * self.nlevels,
return MultiIndex(levels=self.levels,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you changing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback IMHO, if the resulting intersection is empty, the levels should be preserved. Here's a simple example:

mi = pd.MultiIndex.from_product([['a','b','c'],[1,2]], names=['ll','nn'])
print(mi.drop(mi))
print(mi.intersection([]))

This produces (with v0.22.0):

MultiIndex(levels=[['a', 'b', 'c'], [1, 2]],
           labels=[[], []],
           names=['ll', 'nn'])
MultiIndex(levels=[[], []],
           labels=[[], []],
           names=['ll', 'nn'])

Doing this change as I documented it makes the set algebra consistent.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm finding this all a bit hard to follow as well, since you have to hack around other issues in pandas.

FWIW, fixing #10186 is on my medium-term TODO list, but probably not for 0.23.0. If that was solved, how much would the changes here be simplified? What other inconsistencies would have to be worked around?

@@ -837,6 +837,8 @@ Indexing
- Bug in :func:`IntervalIndex.symmetric_difference` where the symmetric difference with a non-``IntervalIndex`` did not raise (:issue:`18475`)
- Bug in :class:`IntervalIndex` where set operations that returned an empty ``IntervalIndex`` had the wrong dtype (:issue:`19101`)
- Bug in :meth:`DataFrame.drop_duplicates` where no ``KeyError`` is raised when passing in columns that don't exist on the ``DataFrame`` (issue:`19726`)
- Bug in :func:`Index.union` and :func:`Index.intersection` where name of the `Index` of the result was not computed correctly for certain cases (:issue:`9943`, :issue:`9862`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double-tick Index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return self

def _union_corner_case(self, other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for inlining it.

@@ -837,6 +837,8 @@ Indexing
- Bug in :func:`IntervalIndex.symmetric_difference` where the symmetric difference with a non-``IntervalIndex`` did not raise (:issue:`18475`)
- Bug in :class:`IntervalIndex` where set operations that returned an empty ``IntervalIndex`` had the wrong dtype (:issue:`19101`)
- Bug in :meth:`DataFrame.drop_duplicates` where no ``KeyError`` is raised when passing in columns that don't exist on the ``DataFrame`` (issue:`19726`)
- Bug in :func:`Index.union` and :func:`Index.intersection` where name of the `Index` of the result was not computed correctly for certain cases (:issue:`9943`, :issue:`9862`)
- Bug in :func:`Index.difference` when taking difference of set and itself as type was not preserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should "set" be "index"?

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Mar 2, 2018

@TomAugspurger With respect to your question about #10186, the issue comes down to whether we use self._shallow_copy() to return the result of set operations. When I tried using self._shallow_copy() in Index._wrap_setop_result(), and looked at what was happening, then CategoricalIndex was the problem. So it was easier to leave the existing implementation of Index._wrap_setop_result. If you were to handle #10186, then I could experiment more with changing the implementation of Index._wrap_setop_result.

@jreback
Copy link
Contributor

jreback commented Mar 7, 2018

@Dr-Irv as I said above. this needs to be greatly downscoped. You are changing way too many things and not using the current code conventions. So pls break this in to separate PR's.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Mar 7, 2018

@jreback wrote

as I said above. this needs to be greatly downscoped. You are changing way too many things and not using the current code conventions. So pls break this in to separate PR's.

My latest commit (fb3a9db) removes the changes relative to the bug in Index.difference, and I created a separate Issue for that (#20040), so we can discuss there the implementation choices.

@@ -707,6 +707,98 @@ def test_intersect_str_dates(self):

assert len(res) == 0

def create_empty_index(self, id):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need a way to have an empty index that is independent of the individual Index class that is tested in test_corner_union. Alternatively, I could just call id.drop(id) each time I need to create an empty index for the test. IMHO (and I can be convinced otherwise), having the calls to _create_empty_index in the test makes it clear that an empty index is being used in the test.

# GH 9943 9862
# Test unions with various name combinations
# Do not test MultiIndex

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls use parameterize

return self

def _union_corner_case(self, other):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok this PR looks a lot better. I still don't like this hlper function. can we refactor to not use it

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Mar 7, 2018

@jreback with respect to the helper function Index._union_corner_case, the function is called from the implementations of union in DateTimeIndex, RangeIndex and TimeDeltaIndex. I can remove it by just repeating the code in those implementations. If you want the repeated code, let me know.

@@ -300,6 +301,12 @@ def itemsize(self):
# Size of the items in categories, not codes.
return self.values.itemsize

def _wrap_setop_result(self, other, result):
name = get_op_result_name(self, other)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use _shallow_copy here, we never call _simple_new directly. (another reason why the super method should use _shallow_copy, but that's for new PR)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move this method to Index.base? does it break anything?

@@ -1136,6 +1137,11 @@ def union(self, other):
y : Index or DatetimeIndex
"""
self._assert_can_do_setop(other)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just call the super .union() here (after any conversion isneeded)? then you can inline the union_corner bizness

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return self
if len(self) == 0:
return other
is_corner_case, corner_result = self._union_corner_case(other)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@@ -545,6 +546,11 @@ def union(self, other):
y : Index or TimedeltaIndex
"""
self._assert_can_do_setop(other)

is_corner_case, corner_result = self._union_corner_case(other)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here


skip_index_keys = ['tuples', 'repeats']
for key, id in self.indices.items():
if key not in skip_index_keys:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use continue here

('copy', 'A', 'empty', 'B', None),
('copy', 'A', 'empty', None, None),
('copy', None, 'copy', 'B', None),
('copy', None, 'copy', None, None),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do the empties in a separate test from the copies, ok on parameterize the names though (on both tests)

name = self.name if self.name == other.name else None
return self.__class__(result, name=name)
def _wrap_setop_result(self, other, result):
return self._constructor(result, name=get_op_result_name(self, other))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would shallowcopy work here? Constructor isn’t used very often in the indexes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel repeating (part of) reply from a few months ago:

the issue comes down to whether we use self._shallow_copy() to return the result of set operations. When I tried using self._shallow_copy() in Index._wrap_setop_result(), and looked at what was happening, then CategoricalIndex was the problem. So it was easier to leave the existing implementation of Index._wrap_setop_result. If we were to handle #10186, then I could experiment more with changing the implementation of Index._wrap_setop_result.

@jbrockmendel
Copy link
Member

This looks like a nice cleanup, thanks @Dr-Irv.

@jreback
Copy link
Contributor

jreback commented Nov 1, 2018

@Dr-Irv this got a bit lost, can you rebase

@pep8speaks
Copy link

Hello @Dr-Irv! Thanks for updating the PR.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Nov 1, 2018

@jreback This is ready. The failure was due to a timeout on travis. If you want me to do something about that, let me know.

@gfyoung
Copy link
Member

gfyoung commented Nov 1, 2018

The failure was due to a timeout on travis.

@Dr-Irv : I think that's being handled in #23441 .

@TomAugspurger
Copy link
Contributor

Merged master with #23441.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. questions on some tests changes that were needed.

@@ -715,7 +715,10 @@ def test_empty_fancy_raises(self, attr):
# FutureWarning from non-tuple sequence of nd indexing
@pytest.mark.filterwarnings("ignore::FutureWarning")
def test_getitem_error(self, indices, itm):
with pytest.raises(IndexError):
error_type = IndexError
if isinstance(indices, RangeIndex) and (itm == 'no_int'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grr, why did this need changing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the changes I did to testing was to add RangeIndex to the index types used for all of the base index tests. See the change in pandas/tests/indexes/conftest.py . When I did that, it revealed some differences in how RangeIndex works versus the other index types. For the above test, when you pass a string to a RangeIndex, pandas returns a ValueError as opposed to an IndexError, so the change was necessary.

Alternatively, we could change the behavior of RangeIndex to return IndexError when a string is passed to get_item.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this fix is straightforward, pls let's just do this here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one was straightforward.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great

@@ -2517,6 +2581,10 @@ def test_generated_op_names(opname, indices):
# pd.Index.__rsub__ does not exist; though the method does exist
# for subclasses. see GH#19723
return
if (isinstance(index, RangeIndex) and
opname in ['add', 'radd', 'sub', 'rsub',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. Adding RangeIndex to pandas/tests/indexes/conftest.py revealed that the implementation of the operators for RangeIndex is done differently. Namely, for RangeIndex, the op methods are bound to one function with a parameter that includes the opname, but for the other index types, there are direct methods for each of the operators.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above. would like to fix rather than special case a test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was a little tricky, but I think I figured it out right.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you can fix those would be great. ping on green.

@@ -715,7 +715,10 @@ def test_empty_fancy_raises(self, attr):
# FutureWarning from non-tuple sequence of nd indexing
@pytest.mark.filterwarnings("ignore::FutureWarning")
def test_getitem_error(self, indices, itm):
with pytest.raises(IndexError):
error_type = IndexError
if isinstance(indices, RangeIndex) and (itm == 'no_int'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this fix is straightforward, pls let's just do this here

@@ -2517,6 +2581,10 @@ def test_generated_op_names(opname, indices):
# pd.Index.__rsub__ does not exist; though the method does exist
# for subclasses. see GH#19723
return
if (isinstance(index, RangeIndex) and
opname in ['add', 'radd', 'sub', 'rsub',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above. would like to fix rather than special case a test.

@jreback
Copy link
Contributor

jreback commented Nov 6, 2018

lgtm. @Dr-Irv ping on green.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Nov 6, 2018

@jreback all green

@gfyoung
Copy link
Member

gfyoung commented Nov 6, 2018

@Dr-Irv @jreback : Is this PR meant to also close #9943? (or are you just "xref"-ing it) . I only ask because you mention the issue number in the whatsnew but are not closing it.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Nov 6, 2018

@gfyoung Back when I created the PR, I thought this closed #9943 because #9943 was split into #9862 and #9885 . But then I see that @jreback edited the original description to xref #9943. #9943 and #9862 cross reference each other, so I thought putting both refs in the whatsnew is what made sense. But I'll do whatever you guys think is best.

@gfyoung
Copy link
Member

gfyoung commented Nov 6, 2018

Alright then. I'll merge, as is, and let @jreback decide what to do with #9943.

@gfyoung gfyoung merged commit 810826d into pandas-dev:master Nov 6, 2018
@gfyoung
Copy link
Member

gfyoung commented Nov 6, 2018

Thanks @Dr-Irv !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: losing Index/Series names master issue
7 participants