Skip to content

BUG: fix #59950 handle duplicate column names in dataframe queries #59971

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 5, 2024

Conversation

miguelcsx
Copy link
Contributor

Issue Description

When calling DataFrame.query() on a DataFrame with duplicate column names, an unexpected TypeError is raised due to a change introduced in pandas 2.2.1. The root cause is the self.dtypes[k] returning a Series for duplicate column names instead of a single value, leading to a failure in the query evaluation.

Changes Made

  • Fixed an issue where Dataframe.query() would throw an unexpected error

  • The error was caused by self.dtypes[k]

  • Adjusted the behavior to match the behavior prior to pandas version

  • Added tests to ensure that Dataframe.query() works as expected

PR Checklist

@miguelcsx miguelcsx force-pushed the fix/dataframe-query branch 2 times, most recently from 68259e1 to 2637a24 Compare October 4, 2024 20:06
@miguelcsx miguelcsx marked this pull request as draft October 4, 2024 21:23
- Fixed an issue where `Dataframe.query()` would throw an unexpected
  error

- The error was caused by `self.dtypes[k]`

- Adjusted the behavior to match the behavior prior to pandas version

- Added tests to ensure that `Dataframe.query()` works as expected
@miguelcsx miguelcsx force-pushed the fix/dataframe-query branch from 2637a24 to a5bd34b Compare October 5, 2024 03:01
@miguelcsx miguelcsx marked this pull request as ready for review October 5, 2024 03:02
@miguelcsx miguelcsx changed the title fix: #59950 handle duplicate column names in dataframe queries BUG: fix #59950 handle duplicate column names in dataframe queries Oct 5, 2024
@mroeschke mroeschke added the expressions pd.eval, query label Nov 4, 2024
@mroeschke mroeschke added this to the 3.0 milestone Nov 5, 2024
@mroeschke mroeschke merged commit 6631202 into pandas-dev:main Nov 5, 2024
50 of 51 checks passed
@mroeschke
Copy link
Member

Thanks @miguelcsx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
expressions pd.eval, query
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: DataFrame.query() throws error when df has duplicate column names
2 participants