-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Initial implementation of support for SQLAlchemy 2.0 in read_sql etc (WIP) #44794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Currently tested with SQLAlchemy 1.4 using the 'future=True' flag to an engine.
2 thoughts:
|
As far as I'm aware, all the changes to make pandas compatible with SQLAlchemy 1.4 have already been done - and the only bits that are left are to deal with the problems when |
Oh sorry, I meant to say does SQLAlchemy have breaking changes in 2.0 vs 1.4 that wouldn't allow pandas to support 1.4 and +2.0? Trying to gauge whether supporting 2.0 would also necessitate bumping the min supported SQLAlchemy in pandas due to the breaking changes? |
Ah sorry, I misunderstood you. As far as I am aware there are no breaking changes between 1.4 and 2.0 - the main aim of 1.4 was to provide a nice 'stepping-stone' to 2.0 and deal with the deprecations etc. |
Gotcha. Regarding 1), happy to have a separate PR refactor for the Regarding 2), you can use an |
@@ -23,6 +23,7 @@ | |||
import warnings | |||
|
|||
import numpy as np | |||
from sqlalchemy.sql.expression import text |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will break without sqlalchemy installed, see pandasSQL_builder where import_optional_dependency is used.
Maybe a better option is to move the conversion to text
into SQLDatabase.execute, since that will always use SQLAlchemy.
Alternatively, it looks like _convert_params is never called with sqlalchemy=True, so maybe just remove that code path?
This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this. |
Yes, this is still being worked on - but has been delayed because of Christmas and illness. |
Thanks for the pull request, but it appears we have a duplicate effort in #40791 and I think we plan to make a hard SQLAlchemy 2.0 conversion when pandas 2.0 is the next release (maybe end of the year) #44823 (comment) Going to close since I don't think this will be actionable until the pandas 2.0 horizon, but welcome a contribution once that timeline comes! |
Fixes #40460
This is an early-stage draft PR that I would love some feedback on. It is an initial implementation of support for SQLAlchemy v2.0. This hasn't actually been released yet, but v1.40 has been released and it provides 2.0-style behaviour when the
future=True
flag is used when creating an engine. Current pandas fails when a 'future-style' engine is passed to any of theread_sql
(or similar) functions.This PR fixes that, but there are some potential problems that I would like some advice on:
The new SQLAlchemy API requires you to explicitly begin and end all connections/transactions, so we have to keep the connections open where relevant, and close them when finished. This means that the
execute
function insql.py
(the raw function, not the method of any of the classes in there) does anisinstance
check on the database class, and behaves differently depending on whether the class isSQLiteDatabase
(which is just using the sqlite module, not SQLAlchemy) orSQLDatabase
(which uses SQLAlchemy). For the latter we need to explicitly start/end a connection/transaction. This transaction handling logic can't be inside theSQLDatabase.execute
method as in other locations this method is used we need to keep the connection open longer than the life of theexecute
method run, so that we can actually get the data out. This feels like a code smell, and I don't like making the two classes diverge - but I can't think of a better way of doing this without radically changing how the classes behave - which I'm not keen to do. Any thoughts would be very welcome.I want to test this with engines that have
future=False
and also ones that havefuture=True
. Ideally I'd like to parametrise the tests intest_sql.py
so that they can be run twice - once for False and once for True. However, thecreate_engine
calls are in theconnect
method of the classes (eg.TestSQLApi.connect
) and I can't work out if there is a way to parametrise this through pytest. Any ideas?In general any thoughts would be very welcome: I'm happy to work on this more to get it into a state where pandas can accept it.