BUG: Index.difference of itself doesn't preserve type #20062

Dr-Irv · 2018-03-08T21:52:36Z

closes BUG: Index.difference and Index.intersection doesn't preserve type of Index for some Index subclasses for corner cases #20040
tests added / passed
- tests/indexes/test_base.py:test_difference_type
- tests/indexes/test_base.py:test_intersection_difference
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Uses Index._shallow_copy([]), which means fixes are related to getting that to work right.

Fundamental concept is that if the result of Index.difference is an empty index, then result should preserve type and attributes of the object. In addition, for MultiIndex, if result of intersection is an empty index, then the levels are preserved.

codecov · 2018-03-09T03:26:28Z

Codecov Report

❗ No coverage uploaded for pull request base (master@8d82cce). Click here to learn what that means.
The diff coverage is 100%.

@@            Coverage Diff            @@
##             master   #20062   +/-   ##
=========================================
  Coverage          ?   91.78%           
=========================================
  Files             ?      152           
  Lines             ?    49179           
  Branches          ?        0           
=========================================
  Hits              ?    45138           
  Misses            ?     4041           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`90.16% <100%> (?)`
#single	`41.87% <75%> (?)`

Impacted Files	Coverage Δ
pandas/core/indexes/base.py	`96.67% <100%> (ø)`
pandas/core/indexes/multi.py	`95.05% <100%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d82cce...3df7ce6. Read the comment docs.

jreback

lgtm. does this allow one to construct an empty Index that we could not before?

jreback · 2018-03-09T11:21:45Z

pandas/core/indexes/base.py

@@ -458,7 +458,7 @@ def _simple_new(cls, values, name=None, dtype=None, **kwargs):
        Must be careful not to recurse.
        """
        if not hasattr(values, 'dtype'):
-            if values is None and dtype is not None:
+            if (values is None or len(values) == 0) and dtype is not None:


can you change to
not len(values) as more idiomatic

can you update this

jreback · 2018-03-09T11:23:04Z

pandas/tests/indexes/test_base.py

@@ -1034,6 +1034,29 @@ def test_symmetric_difference(self):
        assert tm.equalContents(result, expected)
        assert result.name == 'new_name'

+    def test_difference_type(self):


can you add a test which tries to construct all indices as empty

@jreback for a test to construct all indices as empty, is this a test of the public API's (which have to differ based on the specific index class), or a test of the private Index._shallow_copy([]) ?

For example, pd.RangeIndex([]) currently fails with TypeError: RangeIndex(...) must be called with integers, list was passed for start, which is correct.

right you would skip that one. I think there is already a test of construct empty (pretty sure), but make just point it out and make it IS testing everything.

jreback · 2018-03-09T11:24:46Z

pandas/tests/indexes/test_base.py

+        # If taking difference of a set and itself, it
+        # needs to preserve the type of the index
+        skip_index_keys = ['repeats']
+        for key, id in self.indices.items():


can you use idx rather than id here and below

jreback · 2018-03-09T11:26:00Z

pandas/tests/indexes/test_base.py

+        # Test that the intersection of an index with an
+        # empty index produces the same index as the difference
+        # of an index with itself.  Test for all types
+        skip_index_keys = ['repeats']


for followup PR (or can change here if want to update where needed). a function to return yeld the key, idx while filtering on certain types would be great. (like what you are doing but in a module level function)

jreback

pls rebase and update according to comments.

jreback · 2018-03-13T23:22:59Z

pandas/core/indexes/base.py

@@ -458,7 +458,7 @@ def _simple_new(cls, values, name=None, dtype=None, **kwargs):
        Must be careful not to recurse.
        """
        if not hasattr(values, 'dtype'):
-            if values is None and dtype is not None:
+            if (values is None or len(values) == 0) and dtype is not None:


can you update this

jreback · 2018-03-13T23:23:45Z

pandas/tests/indexes/test_base.py

@@ -1034,6 +1034,29 @@ def test_symmetric_difference(self):
        assert tm.equalContents(result, expected)
        assert result.name == 'new_name'

+    def test_difference_type(self):


right you would skip that one. I think there is already a test of construct empty (pretty sure), but make just point it out and make it IS testing everything.

Dr-Irv · 2018-03-14T18:28:32Z

@jreback latest push (20b8d47) has rebase with master, added a test for constructing empty indexes, and did the yield idea for looping through the index types, as well as the other change you asked for.

jreback · 2018-03-16T22:05:53Z

thanks @Dr-Irv

not very happy with pandas/tests/indexes/test_base.py but that's another issue.

jreback requested changes Mar 9, 2018

View reviewed changes

jreback added Indexing Related to indexing on series/frames, not to indexes themselves Dtype Conversions Unexpected or buggy dtype conversions labels Mar 9, 2018

jreback requested changes Mar 13, 2018

View reviewed changes

jreback mentioned this pull request Mar 14, 2018

BUG: TypeError when calling unique on empty MultiIndex #20308

Closed

Dr-Irv force-pushed the issue20040 branch from 39edcd3 to 20b8d47 Compare March 14, 2018 18:19

BUG: Index.difference of itself doesn't preserve type

3df7ce6

Dr-Irv force-pushed the issue20040 branch from 20b8d47 to 3df7ce6 Compare March 14, 2018 20:11

jreback added this to the 0.23.0 milestone Mar 16, 2018

jreback approved these changes Mar 16, 2018

View reviewed changes

jreback merged commit 083ebac into pandas-dev:master Mar 16, 2018

Dr-Irv deleted the issue20040 branch March 16, 2018 22:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Index.difference of itself doesn't preserve type #20062

BUG: Index.difference of itself doesn't preserve type #20062

Dr-Irv commented Mar 8, 2018

codecov bot commented Mar 9, 2018 •

edited

Loading

jreback left a comment

jreback Mar 9, 2018

jreback Mar 9, 2018

jreback Mar 13, 2018

jreback Mar 9, 2018

Dr-Irv Mar 12, 2018

jreback Mar 13, 2018

jreback Mar 9, 2018

jreback Mar 9, 2018

jreback left a comment

jreback Mar 13, 2018

jreback Mar 13, 2018

Dr-Irv commented Mar 14, 2018

jreback commented Mar 16, 2018

BUG: Index.difference of itself doesn't preserve type #20062

BUG: Index.difference of itself doesn't preserve type #20062

Conversation

Dr-Irv commented Mar 8, 2018

codecov bot commented Mar 9, 2018 • edited Loading

Codecov Report

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Dr-Irv commented Mar 14, 2018

jreback commented Mar 16, 2018

codecov bot commented Mar 9, 2018 •

edited

Loading