Skip to content

Commit 2247b0c

Browse files
erikcsstangirala
authored andcommitted
BUG: wide_to_long should check for unique id vars (pandas-dev#16382) (pandas-dev#16403)
* BUG: wide_to_long should check for unique id vars (pandas-dev#16382) * Fix uncaught lint error * Add whatsnew note (bug fix)
1 parent ba15ce6 commit 2247b0c

File tree

3 files changed

+16
-0
lines changed

3 files changed

+16
-0
lines changed

doc/source/whatsnew/v0.20.2.txt

+2
Original file line numberDiff line numberDiff line change
@@ -82,8 +82,10 @@ Reshaping
8282
^^^^^^^^^
8383

8484
- Bug in ``DataFrame.stack`` with unsorted levels in MultiIndex columns (:issue:`16323`)
85+
- Bug in ``pd.wide_to_long()`` where no error was raised when ``i`` was not a unique identifier (:issue:`16382`)
8586
- Bug in ``Series.isin(..)`` with a list of tuples (:issue:`16394`)
8687

88+
8789
Numeric
8890
^^^^^^^
8991
- Bug in .interpolate(), where limit_direction was not respected when limit=None (default) was passed (:issue:16282)

pandas/core/reshape/reshape.py

+3
Original file line numberDiff line numberDiff line change
@@ -1046,6 +1046,9 @@ def melt_stub(df, stub, i, j, value_vars, sep):
10461046
else:
10471047
i = list(i)
10481048

1049+
if df[i].duplicated().any():
1050+
raise ValueError("the id variables need to uniquely identify each row")
1051+
10491052
value_vars = list(map(lambda stub:
10501053
get_var_names(df, stub, sep, suffix), stubnames))
10511054

pandas/tests/reshape/test_reshape.py

+11
Original file line numberDiff line numberDiff line change
@@ -976,3 +976,14 @@ def test_multiple_id_columns(self):
976976
exp_frame = exp_frame.set_index(['famid', 'birth', 'age'])[['ht']]
977977
long_frame = wide_to_long(df, 'ht', i=['famid', 'birth'], j='age')
978978
tm.assert_frame_equal(long_frame, exp_frame)
979+
980+
def test_non_unique_idvars(self):
981+
# GH16382
982+
# Raise an error message if non unique id vars (i) are passed
983+
df = pd.DataFrame({
984+
'A_A1': [1, 2, 3, 4, 5],
985+
'B_B1': [1, 2, 3, 4, 5],
986+
'x': [1, 1, 1, 1, 1]
987+
})
988+
with pytest.raises(ValueError):
989+
wide_to_long(df, ['A_A', 'B_B'], i='x', j='colname')

0 commit comments

Comments
 (0)