Skip to content

ENH: Initial implementation of support for SQLAlchemy 2.0 in read_sql etc (WIP) #44794

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

robintw
Copy link
Contributor

@robintw robintw commented Dec 6, 2021

Fixes #40460

This is an early-stage draft PR that I would love some feedback on. It is an initial implementation of support for SQLAlchemy v2.0. This hasn't actually been released yet, but v1.40 has been released and it provides 2.0-style behaviour when the future=True flag is used when creating an engine. Current pandas fails when a 'future-style' engine is passed to any of the read_sql (or similar) functions.

This PR fixes that, but there are some potential problems that I would like some advice on:

  1. The new SQLAlchemy API requires you to explicitly begin and end all connections/transactions, so we have to keep the connections open where relevant, and close them when finished. This means that the execute function in sql.py (the raw function, not the method of any of the classes in there) does an isinstance check on the database class, and behaves differently depending on whether the class is SQLiteDatabase (which is just using the sqlite module, not SQLAlchemy) or SQLDatabase (which uses SQLAlchemy). For the latter we need to explicitly start/end a connection/transaction. This transaction handling logic can't be inside the SQLDatabase.execute method as in other locations this method is used we need to keep the connection open longer than the life of the execute method run, so that we can actually get the data out. This feels like a code smell, and I don't like making the two classes diverge - but I can't think of a better way of doing this without radically changing how the classes behave - which I'm not keen to do. Any thoughts would be very welcome.

  2. I want to test this with engines that have future=False and also ones that have future=True. Ideally I'd like to parametrise the tests in test_sql.py so that they can be run twice - once for False and once for True. However, the create_engine calls are in the connect method of the classes (eg. TestSQLApi.connect) and I can't work out if there is a way to parametrise this through pytest. Any ideas?

In general any thoughts would be very welcome: I'm happy to work on this more to get it into a state where pandas can accept it.

  • closes #xxxx
  • tests added / passed
  • Ensure all linting tests pass, see here for how to run them
  • whatsnew entry

Currently tested with SQLAlchemy 1.4 using the 'future=True' flag to an engine.
@mroeschke
Copy link
Member

2 thoughts:

  1. Does SQLAlchemy have any breaking changes with 1.4? The minimum version of SQLAlchemy was just bumped to 1.4 and will be so in the pandas release v1.4 soon. If there, may be better to wait until after the 1.4 release to work towards this.
  2. Towards the approach, think it's best if work aligned with the migration guide: https://docs.sqlalchemy.org/en/14/changelog/migration_20.html, tackling the SQLAlchemy 2.0 changes that are compatible with 1.4 first before the future flag changes.

@robintw
Copy link
Contributor Author

robintw commented Dec 8, 2021

As far as I'm aware, all the changes to make pandas compatible with SQLAlchemy 1.4 have already been done - and the only bits that are left are to deal with the problems when future=True is set. Certainly, I've not had any problems running SQLAlchemy 1.4 with pandas when future=False.

@mroeschke
Copy link
Member

Oh sorry, I meant to say does SQLAlchemy have breaking changes in 2.0 vs 1.4 that wouldn't allow pandas to support 1.4 and +2.0?

Trying to gauge whether supporting 2.0 would also necessitate bumping the min supported SQLAlchemy in pandas due to the breaking changes?

@robintw
Copy link
Contributor Author

robintw commented Dec 8, 2021

Ah sorry, I misunderstood you.

As far as I am aware there are no breaking changes between 1.4 and 2.0 - the main aim of 1.4 was to provide a nice 'stepping-stone' to 2.0 and deal with the deprecations etc.

@mroeschke
Copy link
Member

mroeschke commented Dec 8, 2021

Gotcha.

Regarding 1), happy to have a separate PR refactor for the excute logic to set up transactions logically for SQLAlechemy and other engines.

Regarding 2), you can use an autouse=True pytest fixture for the sqlalchemy tests in another separate PR. But shouldn't the goal be to have the future flag set to True?

@@ -23,6 +23,7 @@
import warnings

import numpy as np
from sqlalchemy.sql.expression import text
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will break without sqlalchemy installed, see pandasSQL_builder where import_optional_dependency is used.

Maybe a better option is to move the conversion to text into SQLDatabase.execute, since that will always use SQLAlchemy.

Alternatively, it looks like _convert_params is never called with sqlalchemy=True, so maybe just remove that code path?

@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2022

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Jan 9, 2022
@robintw
Copy link
Contributor Author

robintw commented Jan 9, 2022

Yes, this is still being worked on - but has been delayed because of Christmas and illness.

@mroeschke
Copy link
Member

Thanks for the pull request, but it appears we have a duplicate effort in #40791 and I think we plan to make a hard SQLAlchemy 2.0 conversion when pandas 2.0 is the next release (maybe end of the year) #44823 (comment)

Going to close since I don't think this will be actionable until the pandas 2.0 horizon, but welcome a contribution once that timeline comes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

QST: Support of SQLAlchemy 2.0 in Pandas
3 participants