Skip to content

Commit 440fc8d

Browse files
NoahTheDukeharisbal
authored and
harisbal
committed
BUG: drop_duplicates not raising KeyError on missing key (pandas-dev#19730)
1 parent e7e1712 commit 440fc8d

File tree

3 files changed

+22
-0
lines changed

3 files changed

+22
-0
lines changed

doc/source/whatsnew/v0.23.0.txt

+2
Original file line numberDiff line numberDiff line change
@@ -795,6 +795,8 @@ Indexing
795795
- Bug in :class:`IntervalIndex` where empty and purely NA data was constructed inconsistently depending on the construction method (:issue:`18421`)
796796
- Bug in :func:`IntervalIndex.symmetric_difference` where the symmetric difference with a non-``IntervalIndex`` did not raise (:issue:`18475`)
797797
- Bug in :class:`IntervalIndex` where set operations that returned an empty ``IntervalIndex`` had the wrong dtype (:issue:`19101`)
798+
- Bug in :meth:`DataFrame.drop_duplicates` where no ``KeyError`` is raised when passing in columns that don't exist on the ``DataFrame`` (issue:`19726`)
799+
798800

799801
MultiIndex
800802
^^^^^^^^^^

pandas/core/frame.py

+7
Original file line numberDiff line numberDiff line change
@@ -3655,6 +3655,13 @@ def f(vals):
36553655
isinstance(subset, tuple) and subset in self.columns):
36563656
subset = subset,
36573657

3658+
# Verify all columns in subset exist in the queried dataframe
3659+
# Otherwise, raise a KeyError, same as if you try to __getitem__ with a
3660+
# key that doesn't exist.
3661+
diff = Index(subset).difference(self.columns)
3662+
if not diff.empty:
3663+
raise KeyError(diff)
3664+
36583665
vals = (col.values for name, col in self.iteritems()
36593666
if name in subset)
36603667
labels, shape = map(list, zip(*map(f, vals)))

pandas/tests/frame/test_analytics.py

+13
Original file line numberDiff line numberDiff line change
@@ -1492,6 +1492,19 @@ def test_drop_duplicates(self):
14921492
for keep in ['first', 'last', False]:
14931493
assert df.duplicated(keep=keep).sum() == 0
14941494

1495+
@pytest.mark.parametrize('subset', ['a', ['a'], ['a', 'B']])
1496+
def test_duplicated_with_misspelled_column_name(self, subset):
1497+
# GH 19730
1498+
df = pd.DataFrame({'A': [0, 0, 1],
1499+
'B': [0, 0, 1],
1500+
'C': [0, 0, 1]})
1501+
1502+
with pytest.raises(KeyError):
1503+
df.duplicated(subset)
1504+
1505+
with pytest.raises(KeyError):
1506+
df.drop_duplicates(subset)
1507+
14951508
def test_drop_duplicates_with_duplicate_column_names(self):
14961509
# GH17836
14971510
df = DataFrame([

0 commit comments

Comments
 (0)