Skip to content

Commit a2ff3f0

Browse files
Licht-Tjreback
authored andcommitted
BUG: Fix wrong column selection in drop_duplicates when duplicate column names (#17879)
1 parent 00f23ca commit a2ff3f0

File tree

3 files changed

+18
-1
lines changed

3 files changed

+18
-1
lines changed

doc/source/whatsnew/v0.21.0.txt

+1
Original file line numberDiff line numberDiff line change
@@ -1008,6 +1008,7 @@ Reshaping
10081008
- Bug in :func:`concat` where order of result index was unpredictable if it contained non-comparable elements (:issue:`17344`)
10091009
- Fixes regression when sorting by multiple columns on a ``datetime64`` dtype ``Series`` with ``NaT`` values (:issue:`16836`)
10101010
- Bug in :func:`pivot_table` where the result's columns did not preserve the categorical dtype of ``columns`` when ``dropna`` was ``False`` (:issue:`17842`)
1011+
- Bug in ``DataFrame.drop_duplicates`` where dropping with non-unique column names raised a ``ValueError`` (:issue:`17836`)
10111012

10121013
Numeric
10131014
^^^^^^^

pandas/core/frame.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -3556,7 +3556,8 @@ def f(vals):
35563556
isinstance(subset, tuple) and subset in self.columns):
35573557
subset = subset,
35583558

3559-
vals = (self[col].values for col in subset)
3559+
vals = (col.values for name, col in self.iteritems()
3560+
if name in subset)
35603561
labels, shape = map(list, zip(*map(f, vals)))
35613562

35623563
ids = get_group_index(labels, shape, sort=False, xnull=False)

pandas/tests/frame/test_analytics.py

+15
Original file line numberDiff line numberDiff line change
@@ -1394,6 +1394,21 @@ def test_drop_duplicates(self):
13941394
for keep in ['first', 'last', False]:
13951395
assert df.duplicated(keep=keep).sum() == 0
13961396

1397+
def test_drop_duplicates_with_duplicate_column_names(self):
1398+
# GH17836
1399+
df = DataFrame([
1400+
[1, 2, 5],
1401+
[3, 4, 6],
1402+
[3, 4, 7]
1403+
], columns=['a', 'a', 'b'])
1404+
1405+
result0 = df.drop_duplicates()
1406+
tm.assert_frame_equal(result0, df)
1407+
1408+
result1 = df.drop_duplicates('a')
1409+
expected1 = df[:2]
1410+
tm.assert_frame_equal(result1, expected1)
1411+
13971412
def test_drop_duplicates_for_take_all(self):
13981413
df = DataFrame({'AAA': ['foo', 'bar', 'baz', 'bar',
13991414
'foo', 'bar', 'qux', 'foo'],

0 commit comments

Comments
 (0)