-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REGR?: read_sql no longer supports duplicate column names #53117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
To illustrate with just the construction of the DataFrame:
I assume we could relatively easily fix this by using a different constructor for the arrays, like:
Further, the functionality to specify the dtype backend is probably something that could be moved into |
would it possible/acceptable to just raise on duplicate column names? I'd make the case disallowing that wherever possible (if people have a dataframe with duplicates rows labels and take a tranpose, then OK, they'll get duplicates, but in IO functions I'd have thought it acceptable to prohibit) |
Can discuss this, but should fix for 2.0.2 |
Thanks for solving this.
Relevant discussion. But Meaning, it would be best if ...
pd.read_sql("SELECT a, b, a +1 as a FROM test_table;", eng) in the dummy reproducer, with ...
exe = eng.execute("SELECT a, b, a +1 as a FROM test_table;")
pd.DataFrame(exe) which in pandas 2.0.1 returns
with the duplicate columns intact. |
Probably caused by #50048, and probably related to #52437 (another change in behaviour that might have been caused by the same PR). In essence the change is from using
DataFrame.from_records
to processing the records into a list of column arrays and then usingDataFrame(dict(zip(columns, arrays)))
. That has slightly different behaviour.Dummy reproducer for the
read_sql
change:With pandas 1.5 this returns
with pandas 2.0 this returns
I don't know how much we want to support duplicate column names in read_sql, but it is a change in behaviour, and the new behaviour of just silently ignoring it / dropping some data also isn't ideal IMO.
cc @phofl
The text was updated successfully, but these errors were encountered: