Skip to content

Commit 6631202

Browse files
miguelcsxmroeschke
andauthored
BUG: fix #59950 handle duplicate column names in dataframe queries (#59971)
fix: #59950 handle duplicate column names in dataframe queries - Fixed an issue where `Dataframe.query()` would throw an unexpected error - The error was caused by `self.dtypes[k]` - Adjusted the behavior to match the behavior prior to pandas version - Added tests to ensure that `Dataframe.query()` works as expected Co-authored-by: Matthew Roeschke <[email protected]>
1 parent bec2dbc commit 6631202

File tree

3 files changed

+22
-2
lines changed

3 files changed

+22
-2
lines changed

doc/source/whatsnew/v3.0.0.rst

+1
Original file line numberDiff line numberDiff line change
@@ -771,6 +771,7 @@ Other
771771
- Bug in :meth:`DataFrame.apply` where passing ``engine="numba"`` ignored ``args`` passed to the applied function (:issue:`58712`)
772772
- Bug in :meth:`DataFrame.eval` and :meth:`DataFrame.query` which caused an exception when using NumPy attributes via ``@`` notation, e.g., ``df.eval("@np.floor(a)")``. (:issue:`58041`)
773773
- Bug in :meth:`DataFrame.eval` and :meth:`DataFrame.query` which did not allow to use ``tan`` function. (:issue:`55091`)
774+
- Bug in :meth:`DataFrame.query` where using duplicate column names led to a ``TypeError``. (:issue:`59950`)
774775
- Bug in :meth:`DataFrame.query` which raised an exception or produced incorrect results when expressions contained backtick-quoted column names containing the hash character ``#``, backticks, or characters that fall outside the ASCII range (U+0001..U+007F). (:issue:`59285`) (:issue:`49633`)
775776
- Bug in :meth:`DataFrame.shift` where passing a ``freq`` on a DataFrame with no columns did not shift the index correctly. (:issue:`60102`)
776777
- Bug in :meth:`DataFrame.sort_index` when passing ``axis="columns"`` and ``ignore_index=True`` and ``ascending=False`` not returning a :class:`RangeIndex` columns (:issue:`57293`)

pandas/core/generic.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -603,9 +603,9 @@ def _get_cleaned_column_resolvers(self) -> dict[Hashable, Series]:
603603
dtypes = self.dtypes
604604
return {
605605
clean_column_name(k): Series(
606-
v, copy=False, index=self.index, name=k, dtype=dtypes[k]
606+
v, copy=False, index=self.index, name=k, dtype=dtype
607607
).__finalize__(self)
608-
for k, v in zip(self.columns, self._iter_column_arrays())
608+
for k, v, dtype in zip(self.columns, self._iter_column_arrays(), dtypes)
609609
if not isinstance(k, int)
610610
}
611611

pandas/tests/frame/test_query_eval.py

+19
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,25 @@ def test_query_empty_string(self):
159159
with pytest.raises(ValueError, match=msg):
160160
df.query("")
161161

162+
def test_query_duplicate_column_name(self, engine, parser):
163+
df = DataFrame(
164+
{
165+
"A": range(3),
166+
"B": range(3),
167+
"C": range(3)
168+
}
169+
).rename(columns={"B": "A"})
170+
171+
res = df.query('C == 1', engine=engine, parser=parser)
172+
173+
expect = DataFrame(
174+
[[1, 1, 1]],
175+
columns=["A", "A", "C"],
176+
index=[1]
177+
)
178+
179+
tm.assert_frame_equal(res, expect)
180+
162181
def test_eval_resolvers_as_list(self):
163182
# GH 14095
164183
df = DataFrame(

0 commit comments

Comments
 (0)