Skip to content

BUG: Index.drop raising Error when Index has duplicates #38070

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Dec 2, 2020
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -636,6 +636,7 @@ MultiIndex
- Bug in :meth:`DataFrame.reset_index` with ``NaT`` values in index raises ``ValueError`` with message ``"cannot convert float NaN to integer"`` (:issue:`36541`)
- Bug in :meth:`DataFrame.combine_first` when used with :class:`MultiIndex` containing string and ``NaN`` values raises ``TypeError`` (:issue:`36562`)
- Bug in :meth:`MultiIndex.drop` dropped ``NaN`` values when non existing key was given as input (:issue:`18853`)
- Bug in :meth:`MultiIndex.drop` dropping more values than expected when index has duplicates and is not sorted (:issue:`33494`)

I/O
^^^
Expand Down Expand Up @@ -758,6 +759,7 @@ Other
- Passing an array with 2 or more dimensions to the :class:`Series` constructor now raises the more specific ``ValueError``, from a bare ``Exception`` previously (:issue:`35744`)
- Bug in ``accessor.DirNamesMixin``, where ``dir(obj)`` wouldn't show attributes defined on the instance (:issue:`37173`).
- Bug in :meth:`Series.nunique` with ``dropna=True`` was returning incorrect results when both ``NA`` and ``None`` missing values were present (:issue:`37566`)
- Bug in :meth:`Index.drop` raising ``InvalidIndexError`` when index has duplicates (:issue:`38051`)

.. ---------------------------------------------------------------------------

Expand Down
5 changes: 4 additions & 1 deletion pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -5508,7 +5508,10 @@ def drop(self, labels, errors: str_t = "raise"):
"""
arr_dtype = "object" if self.dtype == "object" else None
labels = com.index_labels_to_array(labels, dtype=arr_dtype)
indexer = self.get_indexer(labels)
if self.is_unique:
indexer = self.get_indexer(labels)
else:
indexer, _ = self.get_indexer_non_unique(labels)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_indexer_for?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not know this function, thanks very much.

mask = indexer == -1
if mask.any():
if errors != "ignore":
Expand Down
3 changes: 2 additions & 1 deletion pandas/core/indexes/multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -2169,7 +2169,8 @@ def drop(self, codes, level=None, errors="raise"):
if isinstance(loc, int):
inds.append(loc)
elif isinstance(loc, slice):
inds.extend(range(loc.start, loc.stop))
step = loc.step if loc.step is not None else 1
inds.extend(range(loc.start, loc.stop, step))
elif com.is_bool_indexer(loc):
if self.lexsort_depth == 0:
warnings.warn(
Expand Down
8 changes: 8 additions & 0 deletions pandas/tests/indexes/multi/test_drop.py
Original file line number Diff line number Diff line change
Expand Up @@ -147,3 +147,11 @@ def test_drop_with_nan_in_index(nulls_fixture):
msg = r"labels \[Timestamp\('2001-01-01 00:00:00'\)\] not found in level"
with pytest.raises(KeyError, match=msg):
mi.drop(pd.Timestamp("2001"), level="date")


def test_drop_with_non_monotonic_duplicates():
# GH#33494
mi = MultiIndex.from_tuples([(1, 2), (2, 3), (1, 2)])
result = mi.drop((1, 2))
expected = MultiIndex.from_tuples([(2, 3)])
tm.assert_index_equal(result, expected)
7 changes: 7 additions & 0 deletions pandas/tests/indexes/test_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -1496,6 +1496,13 @@ def test_drop_tuple(self, values, to_drop):
with pytest.raises(KeyError, match=msg):
removed.drop(drop_me)

def test_drop_with_duplicates_in_index(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could use the index fixture and do something like

def test_drop_with_duplicates(self, index):
    if len(index) == 0: return
    index = index.repeat(2)  #  <-- ensure duplicates
    res = index.drop(index[0])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better? Is there a way to construc expected without drop?

# GH38051
index = Index([0, 1, 0, 1])
result = index.drop(0)
expected = Index([1, 1])
tm.assert_index_equal(result, expected)

@pytest.mark.parametrize(
"attr",
[
Expand Down