-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fix passing empty label to df drop #21515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @alimcmaster1! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on June 21, 2018 at 08:11 Hours UTC |
Test failure in: test_multilevel.py looking into this |
pandas/core/generic.py
Outdated
# Check if label doesn't exist along axis | ||
if len(labels): | ||
labels_missing = (~np.array([label in axis | ||
for label in labels])).any() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(axis.get_indexer_for(labels) == -1).any()
?
pandas/core/generic.py
Outdated
if errors == 'raise' and indexer.all(): | ||
raise KeyError('{} not found in axis'.format(labels)) | ||
# Check if label doesn't exist along axis | ||
if len(labels): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You shouldn't need to check if the list is (non-)empty. The rule "if any label is missing" is fine (trivially false) with no labels.
pandas/core/indexes/base.py
Outdated
""" | ||
arr_dtype = 'object' if self.dtype == 'object' else None | ||
labels = com._index_labels_to_array(labels, dtype=arr_dtype) | ||
indexer = self.get_indexer(labels) | ||
mask = indexer == -1 | ||
if mask.any(): | ||
if mask.any() and len(mask): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again: this shouldn't be needed.
([1, 1, 2], [], [1, 1, 2]), | ||
([1, 2, 3], [2], [1, 3]), | ||
([1, 1, 3], [1], [3]), | ||
]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to parametrize separately index
and drop_labels
(and construct expected_index
as [l for l in index if not l in drop_labels]
)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is already better, but what I meant was something like
@pytest.mark.parametrize('index', [[1, 2, 3], [1, 1, 2], [1, 2, 2]])
@pytest.mark.parametrize('drop_labels', [[], [1]])
|
||
@pytest.mark.parametrize('index, drop_labels, error_key', [ | ||
([1, 2, 3], [1, 4], 'not contained in axis'), | ||
([1, 2, 2], [1, 4], 'not found in axis'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest instead to make the two errors the same (e.g. fix the one in Index.drop()
in pandas/core/indexes/base.py
). There is no real reason why the wording should differ.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that makes sense
Thanks @toobaz updated as per your comments and fixed that test. ( When |
OK, I see the problem, |
# GH 21494 | ||
expected_index = [i for i in index if i not in drop_labels] | ||
df = pd.DataFrame(index=index).drop(drop_labels) | ||
assert (df.index.values == expected_index).all() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alway do a tm.assert_*
comparison.
this test is in /series/
but you are testing /dataframe
can you move. its ok to leave if you make this a series (and add to /frame/
as well)
|
||
|
||
@pytest.mark.parametrize('index, drop_labels, error_key', [ | ||
([1, 2, 3], [1, 4], 'not found in axis'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the message doesn't need to be a parameter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add a whatsnew, 0.23.2 is ok
Codecov Report
@@ Coverage Diff @@
## master #21515 +/- ##
==========================================
+ Coverage 91.92% 91.92% +<.01%
==========================================
Files 153 153
Lines 49564 49560 -4
==========================================
- Hits 45560 45558 -2
+ Misses 4004 4002 -2
Continue to review full report at Codecov.
|
pandas/core/generic.py
Outdated
|
||
else: | ||
labels = _ensure_object(com._index_labels_to_array(labels)) | ||
if level is not None: | ||
if not isinstance(axis, MultiIndex): | ||
raise AssertionError('axis must be a MultiIndex') | ||
indexer = ~axis.get_level_values(level).isin(labels) | ||
# GH 18561 MultiIndex.drop should raise if label is absent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line before the comment
pandas/core/generic.py
Outdated
except AttributeError: | ||
pass | ||
result = dropped | ||
result = self.reindex(**{axis_name: new_axis}) | ||
|
||
else: | ||
labels = _ensure_object(com._index_labels_to_array(labels)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment to indicate this is the non-unique case
pandas/tests/frame/test_indexing.py
Outdated
@@ -3515,3 +3515,24 @@ def test_functions_no_warnings(self): | |||
with tm.assert_produces_warning(False): | |||
df['group'] = pd.cut(df.value, range(0, 105, 10), right=False, | |||
labels=labels) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move with other drop tests in test_axis_select_reindex
|
||
|
||
@pytest.mark.parametrize('index, drop_labels', [ | ||
([1, 2, 3], []), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move to test_base with other test_drop tests
Thanks @jreback updated as per your comments, also when building the docs I noticed we didn't have the whatsnew entry for 0.23.2 in whatsnew.rst, this intended or want me to add? |
pandas/tests/indexes/test_base.py
Outdated
@@ -1565,6 +1565,79 @@ def test_drop_tuple(self, values, to_drop): | |||
for drop_me in to_drop[1], [to_drop[1]]: | |||
pytest.raises(KeyError, removed.drop, drop_me) | |||
|
|||
def test_drop_unique_and_non_unique_index(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any way to parameterize some of this (e.g. you can break into 2 tests if that makes it easier)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I told you to move them here, but on 2nd thought, there they belong in test_alter_index (where the tests were originally). sorry about that.
test_base is for generically testing Index/Series types and not doing that here so kind of superfluous to put the tests here.
Np @jreback moved them to original file and added parameterization |
@toobaz pls approve & merge when satisfied. lgtm. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, just two minor comments on tests
([1, 1, 2], []), | ||
([1, 2, 3], [2]), | ||
([1, 1, 3], [1]), | ||
]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think (see my previous comment) I would rather write this as:
@pytest.mark.parametrize('index', [[1, 2, 3], [1, 1, 2]])
@pytest.mark.parametrize('drop_labels', [[], [1]])
Do you have any reason to explictly parametrize the combinations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with you, your approach seems more compact. Let me update
@pytest.mark.parametrize('index, drop_labels', [ | ||
([1, 2, 3], [1, 4]), | ||
([1, 2, 2], [1, 4]), | ||
]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly as above, I don't think you need to parametrize drop_labels
.
@jreback , what do you think? Do we have a general rule concerning making parameter combinations more explicit/more compact?
# bad axis | ||
(range(3), list('abc'), ('a',), | ||
0, KeyError, 'not found in axis'), | ||
(range(3), list('abc'), 'one', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jreback that't much neater than my attempt!
@toobaz updated as per your comments, thanks |
([1, 1, 2], []), | ||
([1, 2, 3], [2]), | ||
([1, 1, 3], [1]), | ||
]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with you, your approach seems more compact. Let me update
OK, then here the same applies ;-)
tm.assert_frame_equal(frame, pd.DataFrame(index=expected_index)) | ||
|
||
@pytest.mark.parametrize('index', [[1, 2, 3], [1, 2, 2]]) | ||
@pytest.mark.parametrize('drop_labels', [[1, 4]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also include an entirely missing list (e.g. [4, 5]). So that we at least give this decorator a reason to exist ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@toobaz thanks updated :)
@alimcmaster1 thanks! |
thanks for the help @toobaz |
-Drop method in indexes/base.py, docs say KeyError should only be raised if none of labels are found in selected axis. However
pd.DataFrame(index=[1,2,3]).drop([1, 4])
throws.-Makes behaviour consistent for .drop() across unique/non-unique indexes.
Both the below will now raise a KeyError
pd.DataFrame(index=[1,2,3]).drop([1, 4])
pd.DataFrame(index=[1,1,3]).drop([1, 4])
-Remove unused var
indexer
and_axis