-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fixing multi method for to_sql for non-oracle databases #57311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have to have dialect specific logic in sql.py
. All ops should be generically applicable to the db backend
Totally fair not to accept dialect specific changes, but with the previous commit, ALL queries using the method="multi" argument resolve to insertion via single row. Is it worth breaking this functionality just for the sake of the Oracle connector? |
I think it's reasonable to revert #51648 that broke this for all |
Yes, I definitely can make that PR, should I keep it in this PR and just change it or make a new one? |
Sure you can revert the code change in this PR. It would be good to add a unit test as well so it doesn't break again (
Yeah in the docstring of |
pandas/tests/io/test_sql.py
Outdated
@@ -4331,3 +4330,35 @@ def test_xsqlite_if_exists(sqlite_buildin): | |||
(5, "E"), | |||
] | |||
drop_table(table_name, sqlite_buildin) | |||
|
|||
|
|||
def test_execution_of_multi(mysql_pymysql_engine): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test should test a public API like
df = DataFrame(...)
df.to_sql(..., method="multi")
result = pd.read_sql(...)
tm.assert_frame_equal(df, result)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you say this, do you mean that I should test that the data inserted matches the data read? The test that I performed shows that the statement executed is multi-value. Without a multi-value insert, all the data is still inserted, it is just executed as multiple SQL statements. Here I show that 1 SQL statement contains multiple rows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I see the difficulty. I would still say we don't want to add this complex of a unit test since it can become hard to debug/maintain in the future.
Let's just remove this unit test for now. I think it will take some reorganizing of the sql code to make testing this easier
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the complexity in the monkeypatching or in the regex of the SQL statement? Because perhaps I can find some other way to analyze the statement that is not using regex?
I added a similar unit test on an internal repo which writes to a PrestoDB. There, however, I justed added a timeout because one was exponentially longer than the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I simplified the monkeypatch -- I think this is a pretty durable test now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to not use the sqlalchemy_connectable
fixture for this?
Also I agree with @mroeschke - this test gets into the internals of both pandas implementation and event listening in SQLAlchemy. I don't think there is value in forcing developers / maintainers to have that detailed of knowledge of both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WillAyd The sqlalchemy_connectable would work well.
@mroeschke So is the move to not do a unit test here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I would still recommend not having a unit test for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does it look now?
Looking good but please run the pre-commit checks to fix style issues: https://pandas.pydata.org/docs/dev/development/contributing_codebase.html#pre-commit |
Thanks @kassett |
…oracle databases
…r non-oracle databases) (#57466) Backport PR #57311: Fixing multi method for to_sql for non-oracle databases Co-authored-by: Samuel Chai <[email protected]>
…7311) * Fixing multi method for to_sql for non-oracle databases * Simplifying the if statement * adding a doc * Adding unit test * Update doc/source/whatsnew/v2.2.1.rst Co-authored-by: Matthew Roeschke <[email protected]> * Reverted formatting in test_sql * Simplifying unit test * Removing unit test * remove trailing whitespaces * Removing trailing whitespace * fixing alpahbetical sorting --------- Co-authored-by: Matthew Roeschke <[email protected]>
doc/source/whatsnew/v2.2.1.rst
file if fixing a bug or adding a new feature.