Fix passing empty label to df drop #21515

alimcmaster1 · 2018-06-17T19:19:30Z

Closes Passing empty label list to df.drop() errors when index is non-unique #21494
Tests added / passed

-Drop method in indexes/base.py, docs say KeyError should only be raised if none of labels are found in selected axis. However pd.DataFrame(index=[1,2,3]).drop([1, 4]) throws.

-Makes behaviour consistent for .drop() across unique/non-unique indexes.
Both the below will now raise a KeyError

pd.DataFrame(index=[1,2,3]).drop([1, 4])
pd.DataFrame(index=[1,1,3]).drop([1, 4])

-Remove unused var indexer and _axis

pep8speaks · 2018-06-17T19:19:36Z

Hello @alimcmaster1! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on June 21, 2018 at 08:11 Hours UTC

alimcmaster1 · 2018-06-17T22:00:32Z

Test failure in: test_multilevel.py looking into this

toobaz · 2018-06-17T22:11:08Z

pandas/core/generic.py

+            # Check if label doesn't exist along axis
+            if len(labels):
+                labels_missing = (~np.array([label in axis
+                                             for label in labels])).any()


(axis.get_indexer_for(labels) == -1).any() ?

toobaz · 2018-06-17T22:12:07Z

pandas/core/generic.py

-            if errors == 'raise' and indexer.all():
-                raise KeyError('{} not found in axis'.format(labels))
+            # Check if label doesn't exist along axis
+            if len(labels):


You shouldn't need to check if the list is (non-)empty. The rule "if any label is missing" is fine (trivially false) with no labels.

toobaz · 2018-06-17T22:12:57Z

pandas/core/indexes/base.py

        """
        arr_dtype = 'object' if self.dtype == 'object' else None
        labels = com._index_labels_to_array(labels, dtype=arr_dtype)
        indexer = self.get_indexer(labels)
        mask = indexer == -1
-        if mask.any():
+        if mask.any() and len(mask):


Again: this shouldn't be needed.

toobaz · 2018-06-17T22:21:00Z

pandas/tests/series/indexing/test_alter_index.py

+    ([1, 1, 2], [], [1, 1, 2]),
+    ([1, 2, 3], [2], [1, 3]),
+    ([1, 1, 3], [1], [3]),
+])


Any reason not to parametrize separately index and drop_labels (and construct expected_index as [l for l in index if not l in drop_labels])?

Agree will do

This is already better, but what I meant was something like

@pytest.mark.parametrize('index', [[1, 2, 3], [1, 1, 2], [1, 2, 2]]) @pytest.mark.parametrize('drop_labels', [[], [1]])

toobaz · 2018-06-17T22:24:40Z

pandas/tests/series/indexing/test_alter_index.py

+
+@pytest.mark.parametrize('index, drop_labels, error_key', [
+    ([1, 2, 3], [1, 4], 'not contained in axis'),
+    ([1, 2, 2], [1, 4], 'not found in axis'),


I suggest instead to make the two errors the same (e.g. fix the one in Index.drop() in pandas/core/indexes/base.py). There is no real reason why the wording should differ.

Yes that makes sense

alimcmaster1 · 2018-06-18T08:16:53Z

Thanks @toobaz updated as per your comments and fixed that test. ( When level is not None is true we still need to call indexer.all()

toobaz · 2018-06-18T08:43:01Z

When level is not None is true we still need to call indexer.all()

OK, I see the problem, pd.MultiIndex.from_product([[1, 2]]*2).drop([3], level=0) does not actually raise ( #18561 ). OK to fix it elsewhere, but please refer to that issue in a comment.
(I think the indexer.all() only solves some specific case by chance)

jreback · 2018-06-18T10:26:42Z

pandas/tests/series/indexing/test_alter_index.py

+    # GH 21494
+    expected_index = [i for i in index if i not in drop_labels]
+    df = pd.DataFrame(index=index).drop(drop_labels)
+    assert (df.index.values == expected_index).all()


alway do a tm.assert_* comparison.

this test is in /series/ but you are testing /dataframe can you move. its ok to leave if you make this a series (and add to /frame/ as well)

jreback · 2018-06-18T10:27:14Z

pandas/tests/series/indexing/test_alter_index.py

+
+
+@pytest.mark.parametrize('index, drop_labels, error_key', [
+    ([1, 2, 3], [1, 4], 'not found in axis'),


the message doesn't need to be a parameter

jreback

pls add a whatsnew, 0.23.2 is ok

codecov · 2018-06-18T23:02:14Z

Codecov Report

Merging #21515 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #21515      +/-   ##
==========================================
+ Coverage   91.92%   91.92%   +<.01%     
==========================================
  Files         153      153              
  Lines       49564    49560       -4     
==========================================
- Hits        45560    45558       -2     
+ Misses       4004     4002       -2

Flag	Coverage Δ
#multiple	`90.32% <100%> (ø)`	⬆️
#single	`41.8% <25%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/indexes/multi.py	`94.97% <ø> (-0.01%)`	⬇️
pandas/core/indexes/base.py	`96.62% <100%> (ø)`	⬆️
pandas/core/generic.py	`96.22% <100%> (+0.09%)`	⬆️
pandas/core/sorting.py	`98.19% <0%> (-0.01%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f91a704...454cb7e. Read the comment docs.

jreback · 2018-06-19T11:02:42Z

pandas/core/generic.py


        else:
            labels = _ensure_object(com._index_labels_to_array(labels))
            if level is not None:
                if not isinstance(axis, MultiIndex):
                    raise AssertionError('axis must be a MultiIndex')
                indexer = ~axis.get_level_values(level).isin(labels)
+                # GH 18561 MultiIndex.drop should raise if label is absent


blank line before the comment

jreback · 2018-06-19T11:02:58Z

pandas/core/generic.py

-            except AttributeError:
-                pass
-            result = dropped
+            result = self.reindex(**{axis_name: new_axis})

        else:
            labels = _ensure_object(com._index_labels_to_array(labels))


comment to indicate this is the non-unique case

jreback · 2018-06-19T11:04:53Z

pandas/tests/frame/test_indexing.py

@@ -3515,3 +3515,24 @@ def test_functions_no_warnings(self):
        with tm.assert_produces_warning(False):
            df['group'] = pd.cut(df.value, range(0, 105, 10), right=False,
                                 labels=labels)
+


move with other drop tests in test_axis_select_reindex

jreback · 2018-06-19T11:05:41Z

pandas/tests/series/indexing/test_alter_index.py

+
+
+@pytest.mark.parametrize('index, drop_labels', [
+    ([1, 2, 3], []),


move to test_base with other test_drop tests

alimcmaster1 · 2018-06-19T19:26:54Z

Thanks @jreback updated as per your comments, also when building the docs I noticed we didn't have the whatsnew entry for 0.23.2 in whatsnew.rst, this intended or want me to add?

jreback · 2018-06-19T20:43:05Z

pandas/tests/indexes/test_base.py

@@ -1565,6 +1565,79 @@ def test_drop_tuple(self, values, to_drop):
        for drop_me in to_drop[1], [to_drop[1]]:
            pytest.raises(KeyError, removed.drop, drop_me)

+    def test_drop_unique_and_non_unique_index(self):


any way to parameterize some of this (e.g. you can break into 2 tests if that makes it easier)

I told you to move them here, but on 2nd thought, there they belong in test_alter_index (where the tests were originally). sorry about that.

test_base is for generically testing Index/Series types and not doing that here so kind of superfluous to put the tests here.

alimcmaster1 · 2018-06-19T23:57:00Z

Np @jreback moved them to original file and added parameterization

jreback · 2018-06-20T10:21:09Z

@toobaz pls approve & merge when satisfied. lgtm.

toobaz

Looks great, just two minor comments on tests

toobaz · 2018-06-20T12:58:59Z

pandas/tests/frame/test_axis_select_reindex.py

+        ([1, 1, 2], []),
+        ([1, 2, 3], [2]),
+        ([1, 1, 3], [1]),
+    ])


I still think (see my previous comment) I would rather write this as:

@pytest.mark.parametrize('index', [[1, 2, 3], [1, 1, 2]]) @pytest.mark.parametrize('drop_labels', [[], [1]])

Do you have any reason to explictly parametrize the combinations?

Agree with you, your approach seems more compact. Let me update

toobaz · 2018-06-20T13:03:57Z

pandas/tests/frame/test_axis_select_reindex.py

+    @pytest.mark.parametrize('index, drop_labels', [
+        ([1, 2, 3], [1, 4]),
+        ([1, 2, 2], [1, 4]),
+    ])


Similarly as above, I don't think you need to parametrize drop_labels.

@jreback , what do you think? Do we have a general rule concerning making parameter combinations more explicit/more compact?

…rrors-2

alimcmaster1 · 2018-06-20T22:08:18Z

pandas/tests/series/indexing/test_alter_index.py

+        # bad axis
+        (range(3), list('abc'), ('a',),
+         0, KeyError, 'not found in axis'),
+        (range(3), list('abc'), 'one',


Thanks @jreback that't much neater than my attempt!

alimcmaster1 · 2018-06-20T22:09:44Z

@toobaz updated as per your comments, thanks

toobaz · 2018-06-20T23:16:16Z

pandas/tests/series/indexing/test_alter_index.py

+    ([1, 1, 2], []),
+    ([1, 2, 3], [2]),
+    ([1, 1, 3], [1]),
+])


Agree with you, your approach seems more compact. Let me update

OK, then here the same applies ;-)

toobaz · 2018-06-20T23:18:35Z

pandas/tests/frame/test_axis_select_reindex.py

+        tm.assert_frame_equal(frame, pd.DataFrame(index=expected_index))
+
+    @pytest.mark.parametrize('index', [[1, 2, 3], [1, 2, 2]])
+    @pytest.mark.parametrize('drop_labels', [[1, 4]])


Please also include an entirely missing list (e.g. [4, 5]). So that we at least give this decorator a reason to exist ;-)

@toobaz thanks updated :)

toobaz · 2018-06-21T08:13:29Z

@alimcmaster1 thanks!

alimcmaster1 · 2018-06-21T08:38:10Z

thanks for the help @toobaz

Closes #21494 (cherry picked from commit f4fba9e)

Closes pandas-dev#21494

Fix passing empty label to df drop

35742fc

alimcmaster1 added 2 commits June 17, 2018 20:23

Pep8

832a50b

Pep8

bb80ded

toobaz suggested changes Jun 17, 2018

View reviewed changes

Update per toobaz comments

a81c74a

jreback requested changes Jun 18, 2018

View reviewed changes

jreback added Indexing Related to indexing on series/frames, not to indexes themselves MultiIndex labels Jun 18, 2018

jreback requested changes Jun 18, 2018

View reviewed changes

Update per comment jreback

394b384

Pep8

93bbf05

jreback requested changes Jun 19, 2018

View reviewed changes

Update per comment jreback

1b832d0

jreback requested changes Jun 19, 2018

View reviewed changes

Update per comment jreback

8119f72

jreback added 2 commits June 20, 2018 06:15

Merge branch 'master' into PR_TOOL_MERGE_PR_21515

9d5a42f

clean tests

01f6a9c

jreback added this to the 0.23.2 milestone Jun 20, 2018

jreback approved these changes Jun 20, 2018

View reviewed changes

toobaz reviewed Jun 20, 2018

View reviewed changes

alimcmaster1 added 2 commits June 20, 2018 23:03

Parameterize test cases

13b36c2

Merge remote-tracking branch 'origin/df-drop-errors-2' into df-drop-e…

d5f67e1

…rrors-2

alimcmaster1 commented Jun 20, 2018

View reviewed changes

toobaz reviewed Jun 20, 2018

View reviewed changes

Parameterize test cases

1ffc0d4

toobaz approved these changes Jun 21, 2018

View reviewed changes

Merge branch 'master' into df-drop-errors-2

454cb7e

toobaz merged commit f4fba9e into pandas-dev:master Jun 21, 2018

alimcmaster1 deleted the df-drop-errors-2 branch June 21, 2018 08:38

jorisvandenbossche added Needs Backport and removed Needs Backport labels Jun 29, 2018

jorisvandenbossche pushed a commit that referenced this pull request Jun 29, 2018

BUG: Fix passing empty label to df drop (#21515)

83d51cd

Closes #21494 (cherry picked from commit f4fba9e)

jorisvandenbossche pushed a commit that referenced this pull request Jul 2, 2018

BUG: Fix passing empty label to df drop (#21515)

a2199d2

Closes #21494 (cherry picked from commit f4fba9e)

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

BUG: Fix passing empty label to df drop (pandas-dev#21515)

fb8cdef

Closes pandas-dev#21494



		@pytest.mark.parametrize('index, drop_labels, error_key', [
		([1, 2, 3], [1, 4], 'not found in axis'),



		@pytest.mark.parametrize('index, drop_labels', [
		([1, 2, 3], []),

Fix passing empty label to df drop #21515

Fix passing empty label to df drop #21515

Conversation

alimcmaster1 commented Jun 17, 2018 • edited Loading

pep8speaks commented Jun 17, 2018 • edited Loading

Comment last updated on June 21, 2018 at 08:11 Hours UTC

alimcmaster1 commented Jun 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

toobaz Jun 18, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alimcmaster1 commented Jun 18, 2018

toobaz commented Jun 18, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

codecov bot commented Jun 18, 2018 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alimcmaster1 commented Jun 19, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alimcmaster1 commented Jun 19, 2018

jreback commented Jun 20, 2018

toobaz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alimcmaster1 commented Jun 20, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

toobaz commented Jun 21, 2018

alimcmaster1 commented Jun 21, 2018

alimcmaster1 commented Jun 17, 2018 •

edited

Loading

pep8speaks commented Jun 17, 2018 •

edited

Loading

toobaz Jun 18, 2018 •

edited

Loading

toobaz commented Jun 18, 2018 •

edited

Loading

codecov bot commented Jun 18, 2018 •

edited

Loading