ENH: Initial implementation of support for SQLAlchemy 2.0 in read_sql etc (WIP) #44794

robintw · 2021-12-06T21:07:31Z

This is an early-stage draft PR that I would love some feedback on. It is an initial implementation of support for SQLAlchemy v2.0. This hasn't actually been released yet, but v1.40 has been released and it provides 2.0-style behaviour when the future=True flag is used when creating an engine. Current pandas fails when a 'future-style' engine is passed to any of the read_sql (or similar) functions.

This PR fixes that, but there are some potential problems that I would like some advice on:

The new SQLAlchemy API requires you to explicitly begin and end all connections/transactions, so we have to keep the connections open where relevant, and close them when finished. This means that the execute function in sql.py (the raw function, not the method of any of the classes in there) does an isinstance check on the database class, and behaves differently depending on whether the class is SQLiteDatabase (which is just using the sqlite module, not SQLAlchemy) or SQLDatabase (which uses SQLAlchemy). For the latter we need to explicitly start/end a connection/transaction. This transaction handling logic can't be inside the SQLDatabase.execute method as in other locations this method is used we need to keep the connection open longer than the life of the execute method run, so that we can actually get the data out. This feels like a code smell, and I don't like making the two classes diverge - but I can't think of a better way of doing this without radically changing how the classes behave - which I'm not keen to do. Any thoughts would be very welcome.
I want to test this with engines that have future=False and also ones that have future=True. Ideally I'd like to parametrise the tests in test_sql.py so that they can be run twice - once for False and once for True. However, the create_engine calls are in the connect method of the classes (eg. TestSQLApi.connect) and I can't work out if there is a way to parametrise this through pytest. Any ideas?

In general any thoughts would be very welcome: I'm happy to work on this more to get it into a state where pandas can accept it.

closes #xxxx
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

Currently tested with SQLAlchemy 1.4 using the 'future=True' flag to an engine.

mroeschke · 2021-12-08T01:48:05Z

2 thoughts:

Does SQLAlchemy have any breaking changes with 1.4? The minimum version of SQLAlchemy was just bumped to 1.4 and will be so in the pandas release v1.4 soon. If there, may be better to wait until after the 1.4 release to work towards this.
Towards the approach, think it's best if work aligned with the migration guide: https://docs.sqlalchemy.org/en/14/changelog/migration_20.html, tackling the SQLAlchemy 2.0 changes that are compatible with 1.4 first before the future flag changes.

robintw · 2021-12-08T11:19:17Z

As far as I'm aware, all the changes to make pandas compatible with SQLAlchemy 1.4 have already been done - and the only bits that are left are to deal with the problems when future=True is set. Certainly, I've not had any problems running SQLAlchemy 1.4 with pandas when future=False.

mroeschke · 2021-12-08T18:23:16Z

Oh sorry, I meant to say does SQLAlchemy have breaking changes in 2.0 vs 1.4 that wouldn't allow pandas to support 1.4 and +2.0?

Trying to gauge whether supporting 2.0 would also necessitate bumping the min supported SQLAlchemy in pandas due to the breaking changes?

robintw · 2021-12-08T19:10:47Z

Ah sorry, I misunderstood you.

As far as I am aware there are no breaking changes between 1.4 and 2.0 - the main aim of 1.4 was to provide a nice 'stepping-stone' to 2.0 and deal with the deprecations etc.

mroeschke · 2021-12-08T21:32:06Z

Gotcha.

Regarding 1), happy to have a separate PR refactor for the excute logic to set up transactions logically for SQLAlechemy and other engines.

Regarding 2), you can use an autouse=True pytest fixture for the sqlalchemy tests in another separate PR. But shouldn't the goal be to have the future flag set to True?

mvirts · 2021-12-09T21:37:57Z

pandas/io/sql.py

@@ -23,6 +23,7 @@
 import warnings

 import numpy as np
+from sqlalchemy.sql.expression import text


I think this will break without sqlalchemy installed, see pandasSQL_builder where import_optional_dependency is used.

Maybe a better option is to move the conversion to text into SQLDatabase.execute, since that will always use SQLAlchemy.

Alternatively, it looks like _convert_params is never called with sqlalchemy=True, so maybe just remove that code path?

github-actions · 2022-01-09T03:09:56Z

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

robintw · 2022-01-09T19:11:41Z

Yes, this is still being worked on - but has been delayed because of Christmas and illness.

mroeschke · 2022-02-06T22:59:08Z

Thanks for the pull request, but it appears we have a duplicate effort in #40791 and I think we plan to make a hard SQLAlchemy 2.0 conversion when pandas 2.0 is the next release (maybe end of the year) #44823 (comment)

Going to close since I don't think this will be actionable until the pandas 2.0 horizon, but welcome a contribution once that timeline comes!

robintw added 2 commits December 6, 2021 20:57

Initial implementation of support for SQLAlchemy 2.0.

c01b207

Currently tested with SQLAlchemy 1.4 using the 'future=True' flag to an engine.

Run pre-commit hooks

921bcd4

mvirts reviewed Dec 9, 2021

View reviewed changes

IanMayo mentioned this pull request Jan 5, 2022

Proof of Concept for Pepys in Jupyter debrief/pepys-import#1078

Open

7 tasks

github-actions bot added the Stale label Jan 9, 2022

mroeschke closed this Feb 6, 2022

robintw mentioned this pull request Feb 7, 2022

WIP: Add Jupyter notebook analysis examples debrief/pepys-import#1096

Draft

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Initial implementation of support for SQLAlchemy 2.0 in read_sql etc (WIP) #44794

ENH: Initial implementation of support for SQLAlchemy 2.0 in read_sql etc (WIP) #44794

robintw commented Dec 6, 2021 •

edited

Loading

mroeschke commented Dec 8, 2021

robintw commented Dec 8, 2021

mroeschke commented Dec 8, 2021

robintw commented Dec 8, 2021

mroeschke commented Dec 8, 2021 •

edited

Loading

mvirts Dec 9, 2021

github-actions bot commented Jan 9, 2022

robintw commented Jan 9, 2022

mroeschke commented Feb 6, 2022

ENH: Initial implementation of support for SQLAlchemy 2.0 in read_sql etc (WIP) #44794

ENH: Initial implementation of support for SQLAlchemy 2.0 in read_sql etc (WIP) #44794

Conversation

robintw commented Dec 6, 2021 • edited Loading

mroeschke commented Dec 8, 2021

robintw commented Dec 8, 2021

mroeschke commented Dec 8, 2021

robintw commented Dec 8, 2021

mroeschke commented Dec 8, 2021 • edited Loading

mvirts Dec 9, 2021

Choose a reason for hiding this comment

github-actions bot commented Jan 9, 2022

robintw commented Jan 9, 2022

mroeschke commented Feb 6, 2022

robintw commented Dec 6, 2021 •

edited

Loading

mroeschke commented Dec 8, 2021 •

edited

Loading