-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Refactor sqlalchemy code in pandas.io.sql to help prepare for sqlalchemy 2.0. #49531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for the followup, but could you pair down the changes even more (1 PR for 1 targeted change). It will greatly help the review process. It appears this PR is tackling 3 things which would be better if they were 3 individual PRs
|
Yes, I think I can do that. For the first part, where I make SQLDatabase accept Connections only, I might have to disable the tests which pass an Engine, but that will be a much smaller change than the test refactor. |
I pulled the engine disposal and test refactoring out of this PR. I think the engine disposal can wait until that last PR. But part of my motivation for going with this approach was so that I could put the engine disposal into the |
@mroeschke , I've addressed all the comments. After this PR, the next step would be to refactor the test classes, since the ones involving the sqlalchemy engines would no longer be necessary. In a later PR, I would dispose of the sqlalchemy engine, if it needed to be created, within the new function |
|
||
|
||
@contextmanager | ||
def _sqlalchemy_con(connectable, need_transaction: bool): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this be reused in your followup PR? I have a slight preference of just folding this into pandasSQL_builder
for now until there's a need to share this connection logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only plan for _sqlalchemy_con
to be called by pandasSQL_builder
and not by anything else. I split it out to reduce the amount of indentation and keep the functions small. There is an additional try/finally block and some additional if/else blocks in lines 772-800 here: https://github.com/cdcadman/pandas/blob/sql_fixes/pandas/io/sql.py . I think I can fold _sqlalchemy_con
into pandasSQL_builder
if you still prefer that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay gotcha. I think it'll be fine to leave as a separate function with more code incoming.
However, could you replace import sqlalchemy
here with sqlalchemy = import_optional_dependnecy(..., errors="raise")
? Just so that if someone else tries using this function in the future it's known that sqlalchemy is required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I made the change.
Awesome, thanks for the progress here @cdcadman |
…emy 2.0. (pandas-dev#49531) * DOC: Clarify behavior of DataFrame.to_sql * CLN: Make SQLDatabase only accept a sqlalchemy Connection. Co-authored-by: Chuck Cadman <[email protected]>
…emy 2.0. (pandas-dev#49531) * DOC: Clarify behavior of DataFrame.to_sql * CLN: Make SQLDatabase only accept a sqlalchemy Connection. Co-authored-by: Chuck Cadman <[email protected]>
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.I am splitting this out of #48576 , because it is a major refactor of the code, with the goal of making
SQLDatabase
only accept aConnection
and not anEngine
. sqlalchemy 2.0 restricts the methods that are available to them, which makes it harder to write code that works with both. For example,Connection.connect()
creates a branched connection in sqlalchemy 1.x, but is removed in 2.0, but this is called in SQLDatabase.check_case_sensitive().I also added some clarification on how transactions work in
DataFrame.to_sql
, based on this example, run against pandas 1.5.1: