Skip to content

BUG: Index.difference of itself doesn't preserve type #20062

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 16, 2018

Conversation

Dr-Irv
Copy link
Contributor

@Dr-Irv Dr-Irv commented Mar 8, 2018

Uses Index._shallow_copy([]), which means fixes are related to getting that to work right.

Fundamental concept is that if the result of Index.difference is an empty index, then result should preserve type and attributes of the object. In addition, for MultiIndex, if result of intersection is an empty index, then the levels are preserved.

@codecov
Copy link

codecov bot commented Mar 9, 2018

Codecov Report

❗ No coverage uploaded for pull request base (master@8d82cce). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff            @@
##             master   #20062   +/-   ##
=========================================
  Coverage          ?   91.78%           
=========================================
  Files             ?      152           
  Lines             ?    49179           
  Branches          ?        0           
=========================================
  Hits              ?    45138           
  Misses            ?     4041           
  Partials          ?        0
Flag Coverage Δ
#multiple 90.16% <100%> (?)
#single 41.87% <75%> (?)
Impacted Files Coverage Δ
pandas/core/indexes/base.py 96.67% <100%> (ø)
pandas/core/indexes/multi.py 95.05% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d82cce...3df7ce6. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. does this allow one to construct an empty Index that we could not before?

@@ -458,7 +458,7 @@ def _simple_new(cls, values, name=None, dtype=None, **kwargs):
Must be careful not to recurse.
"""
if not hasattr(values, 'dtype'):
if values is None and dtype is not None:
if (values is None or len(values) == 0) and dtype is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you change to
not len(values) as more idiomatic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same below

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update this

@@ -1034,6 +1034,29 @@ def test_symmetric_difference(self):
assert tm.equalContents(result, expected)
assert result.name == 'new_name'

def test_difference_type(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test which tries to construct all indices as empty

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback for a test to construct all indices as empty, is this a test of the public API's (which have to differ based on the specific index class), or a test of the private Index._shallow_copy([]) ?

For example, pd.RangeIndex([]) currently fails with TypeError: RangeIndex(...) must be called with integers, list was passed for start, which is correct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right you would skip that one. I think there is already a test of construct empty (pretty sure), but make just point it out and make it IS testing everything.

# If taking difference of a set and itself, it
# needs to preserve the type of the index
skip_index_keys = ['repeats']
for key, id in self.indices.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use idx rather than id here and below

# Test that the intersection of an index with an
# empty index produces the same index as the difference
# of an index with itself. Test for all types
skip_index_keys = ['repeats']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for followup PR (or can change here if want to update where needed). a function to return yeld the key, idx while filtering on certain types would be great. (like what you are doing but in a module level function)

@jreback jreback added Indexing Related to indexing on series/frames, not to indexes themselves Dtype Conversions Unexpected or buggy dtype conversions labels Mar 9, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls rebase and update according to comments.

@@ -458,7 +458,7 @@ def _simple_new(cls, values, name=None, dtype=None, **kwargs):
Must be careful not to recurse.
"""
if not hasattr(values, 'dtype'):
if values is None and dtype is not None:
if (values is None or len(values) == 0) and dtype is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update this

@@ -1034,6 +1034,29 @@ def test_symmetric_difference(self):
assert tm.equalContents(result, expected)
assert result.name == 'new_name'

def test_difference_type(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right you would skip that one. I think there is already a test of construct empty (pretty sure), but make just point it out and make it IS testing everything.

@Dr-Irv
Copy link
Contributor Author

Dr-Irv commented Mar 14, 2018

@jreback latest push (20b8d47) has rebase with master, added a test for constructing empty indexes, and did the yield idea for looping through the index types, as well as the other change you asked for.

@jreback jreback added this to the 0.23.0 milestone Mar 16, 2018
@jreback jreback merged commit 083ebac into pandas-dev:master Mar 16, 2018
@jreback
Copy link
Contributor

jreback commented Mar 16, 2018

thanks @Dr-Irv

not very happy with pandas/tests/indexes/test_base.py but that's another issue.

@Dr-Irv Dr-Irv deleted the issue20040 branch March 16, 2018 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
2 participants