-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
QST: Deadlock when using df.to_sql for none-related tables #36542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
these functions are just wrappers around the db itself w/o a full reproducible example your best resource is SO |
FWIW, it appears the to_sql operation (via sqlalchemy?) is requesting information from the information schema for table objects. Specifically for an MSSQL server I see this firing: SELECT [INFORMATION_SCHEMA].[TABLES].[TABLE_NAME] This query, for whatever reason, is showing that it is blocked with a "LCK_M_S" lock, waiting on some ETL process I have running right now, on totally unrelated tables. |
@keithkmyers , fyi there is more info about this for SQL Server here:
I'm running through the same issue. It seems like this should be worked around from Pandas side. |
I'm in the same boat. Trying to implement a new ETL process using Python / Pandas, but my destination is MS SQL (on-prem 2014). When inserting one table, all database get locked, with queries in INFORMATION_SCHEMA :( Idk if it's happens only with MS SQL or is something by-design in pandas... Any news @match-gabeflores @keithkmyers ?? |
Hi, Same issue on ELT process (on Load part). To load I use Only workaround is a retry |
I posted a workaround here, as well as more details on the root cause. Use a table name that is all lower case (use Use
|
Mentioning the proper version of the ODBC driver in the connection string has solved this for me. I don't know how it solved it but the error is gone.
|
I have a multithreaded ETL that occasionally fails due to an issue I can't understand and would like an explanation.
I have asked the following question but did not get a clear answer.
Basically my ETL uses the load method defined below in different instances, each instance on a different thread, and occasionally I get a deadlock error.
This is the error log:
As you can see, when two threads use the
DataFrame.to_sql()
method at roughly the same time this could cause a deadlock.The deadlock could occur on any of the classes.
I know I can call load again until it succeeds, but I want to understand why this happens in the first place.
I tried debugging this issue first through MySQL DB but when I run
SHOW ENGINE INNODB STATUS
no deadlocks show up, which leads me here.My question is why would two none-related tables could have a deadlock?
what does pandas df.to_sql() does that could cause this?
I assumed df.to_sql() simply executes an INSERT statement but when I checked the logs I saw it does many more things I do not understand:
SELECT * FROM mysql.general_log WHERE event_time >= "2020-09-17 09:48:27" AND event_time <= "2020-09-17 09:48:29" ORDER BY event_time ASC;
Result:
The text was updated successfully, but these errors were encountered: