Skip to content

Commit a5bd34b

Browse files
committed
fix: #59950 handle duplicate column names in dataframe queries
- Fixed an issue where `Dataframe.query()` would throw an unexpected error - The error was caused by `self.dtypes[k]` - Adjusted the behavior to match the behavior prior to pandas version - Added tests to ensure that `Dataframe.query()` works as expected
1 parent 139def2 commit a5bd34b

File tree

3 files changed

+22
-2
lines changed

3 files changed

+22
-2
lines changed

doc/source/whatsnew/v3.0.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -704,6 +704,7 @@ Other
704704
- Bug in :meth:`DataFrame.apply` where passing ``engine="numba"`` ignored ``args`` passed to the applied function (:issue:`58712`)
705705
- Bug in :meth:`DataFrame.eval` and :meth:`DataFrame.query` which caused an exception when using NumPy attributes via ``@`` notation, e.g., ``df.eval("@np.floor(a)")``. (:issue:`58041`)
706706
- Bug in :meth:`DataFrame.eval` and :meth:`DataFrame.query` which did not allow to use ``tan`` function. (:issue:`55091`)
707+
- Bug in :meth:`DataFrame.query` where using duplicate column names led to a ``TypeError``. (:issue:`59950`)
707708
- Bug in :meth:`DataFrame.query` which raised an exception or produced incorrect results when expressions contained backtick-quoted column names containing the hash character ``#``, backticks, or characters that fall outside the ASCII range (U+0001..U+007F). (:issue:`59285`) (:issue:`49633`)
708709
- Bug in :meth:`DataFrame.sort_index` when passing ``axis="columns"`` and ``ignore_index=True`` and ``ascending=False`` not returning a :class:`RangeIndex` columns (:issue:`57293`)
709710
- Bug in :meth:`DataFrame.transform` that was returning the wrong order unless the index was monotonically increasing. (:issue:`57069`)

pandas/core/generic.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -603,9 +603,9 @@ def _get_cleaned_column_resolvers(self) -> dict[Hashable, Series]:
603603
dtypes = self.dtypes
604604
return {
605605
clean_column_name(k): Series(
606-
v, copy=False, index=self.index, name=k, dtype=dtypes[k]
606+
v, copy=False, index=self.index, name=k, dtype=dtype
607607
).__finalize__(self)
608-
for k, v in zip(self.columns, self._iter_column_arrays())
608+
for k, v, dtype in zip(self.columns, self._iter_column_arrays(), dtypes)
609609
if not isinstance(k, int)
610610
}
611611

pandas/tests/frame/test_query_eval.py

+19
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,25 @@ def test_query_empty_string(self):
159159
with pytest.raises(ValueError, match=msg):
160160
df.query("")
161161

162+
def test_query_duplicate_column_name(self, engine, parser):
163+
df = DataFrame(
164+
{
165+
"A": range(3),
166+
"B": range(3),
167+
"C": range(3)
168+
}
169+
).rename(columns={"B": "A"})
170+
171+
res = df.query('C == 1', engine=engine, parser=parser)
172+
173+
expect = DataFrame(
174+
[[1, 1, 1]],
175+
columns=["A", "A", "C"],
176+
index=[1]
177+
)
178+
179+
tm.assert_frame_equal(res, expect)
180+
162181
def test_eval_resolvers_as_list(self):
163182
# GH 14095
164183
df = DataFrame(

0 commit comments

Comments
 (0)