-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Index.difference of itself doesn't preserve type #20062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #20062 +/- ##
=========================================
Coverage ? 91.78%
=========================================
Files ? 152
Lines ? 49179
Branches ? 0
=========================================
Hits ? 45138
Misses ? 4041
Partials ? 0
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. does this allow one to construct an empty Index that we could not before?
pandas/core/indexes/base.py
Outdated
@@ -458,7 +458,7 @@ def _simple_new(cls, values, name=None, dtype=None, **kwargs): | |||
Must be careful not to recurse. | |||
""" | |||
if not hasattr(values, 'dtype'): | |||
if values is None and dtype is not None: | |||
if (values is None or len(values) == 0) and dtype is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you change to
not len(values)
as more idiomatic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update this
@@ -1034,6 +1034,29 @@ def test_symmetric_difference(self): | |||
assert tm.equalContents(result, expected) | |||
assert result.name == 'new_name' | |||
|
|||
def test_difference_type(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a test which tries to construct all indices as empty
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback for a test to construct all indices as empty, is this a test of the public API's (which have to differ based on the specific index class), or a test of the private Index._shallow_copy([])
?
For example, pd.RangeIndex([])
currently fails with TypeError: RangeIndex(...) must be called with integers, list was passed for start
, which is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right you would skip that one. I think there is already a test of construct empty (pretty sure), but make just point it out and make it IS testing everything.
pandas/tests/indexes/test_base.py
Outdated
# If taking difference of a set and itself, it | ||
# needs to preserve the type of the index | ||
skip_index_keys = ['repeats'] | ||
for key, id in self.indices.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you use idx rather than id here and below
# Test that the intersection of an index with an | ||
# empty index produces the same index as the difference | ||
# of an index with itself. Test for all types | ||
skip_index_keys = ['repeats'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for followup PR (or can change here if want to update where needed). a function to return yeld the key, idx while filtering on certain types would be great. (like what you are doing but in a module level function)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls rebase and update according to comments.
pandas/core/indexes/base.py
Outdated
@@ -458,7 +458,7 @@ def _simple_new(cls, values, name=None, dtype=None, **kwargs): | |||
Must be careful not to recurse. | |||
""" | |||
if not hasattr(values, 'dtype'): | |||
if values is None and dtype is not None: | |||
if (values is None or len(values) == 0) and dtype is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you update this
@@ -1034,6 +1034,29 @@ def test_symmetric_difference(self): | |||
assert tm.equalContents(result, expected) | |||
assert result.name == 'new_name' | |||
|
|||
def test_difference_type(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right you would skip that one. I think there is already a test of construct empty (pretty sure), but make just point it out and make it IS testing everything.
thanks @Dr-Irv not very happy with |
git diff upstream/master -u -- "*.py" | flake8 --diff
Uses
Index._shallow_copy([])
, which means fixes are related to getting that to work right.Fundamental concept is that if the result of
Index.difference
is an empty index, then result should preserve type and attributes of the object. In addition, forMultiIndex
, if result of intersection is an empty index, then the levels are preserved.