-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Client hung after end of results during heavy load #285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @encryptio, can you give us a program reliably reproducing this? |
I've been trying to make a minimal reproducer, and in doing so tried to find a series of calls that caused it. That's not done yet, but I did find something interesting after creating a fake mysql driver that logs all the calls made to it and passes them on to All lockups occur when |
I was able to fix it with this patch: It changes the logic of Similar logic probably needs to be applied to the |
I've spent several hours trying to build a small reproducer, but haven't had success yet. I have other priorities for the moment, and hope to come back to this starting next week. |
@encryptio thank you! Your analysis definitely helps, maybe we can even get rid of more than one issue at once. I also don't have the time right now to really dig into it. |
I think my patch is leaking connections when the deadlock error occurs. Not sure why, but at least the user code continues. |
Probably because the net connection is not closed. You only set rows.mc to nil, but rows.mc.netConn is still open. |
I've been looking into this further; closing the netConn explicitly should fix the connections leaking issue, but that doesn't seem right either: If this happens in a transaction (which is the case in my code), the transaction shouldn't be forfeited; it's a non-fatal error, and you can continue to send (different) queries to the transaction (IMO, that's a misdesign of the MySQL protocol, but alas.) Why is When the deadlock error is returned from tx.Query/tx.Exec directly (that is, not after row iteration starts) it does allow tx.Rollback to reuse the connection later. Unfortunately, I can't reproduce the leak anymore with my patch applied (with no version changes, afaik), yet the current master branch of the mysql driver still locks up. |
Same problem. but my code just read mysql, no write. And blocked on :
which is
|
Can one of you provide us a small sample program with which this deadlock occurs? |
Are there any messages in MySQL's error log? |
Hi! I have similar problem in production, and after couple of days of debugging, realized that everything is just like @encryptio says. But, in our case error codes are 1885 and 1028 (query execution timeout related). Here is minimal code to reproduce bug - https://gist.github.com/ikkeps/058e090b561add81a910 (it uses set max_statement_time=... to force myqsl to throw error. Yes, simple db.Exec is not really reliable here, since client can reconnect at any time, but I did not found better way to do it.) Also i did not see any messages in mysql log except many of
|
@ikkeps if you use an older MySQL version on your server, that query can easily exceed a 1 second timeout. It takes 12 seconds on one of our machines. |
@arnehormann that was the point (also max_statement_time is in milliseconds). Driver hangs when server returns some error (I am not sure about other errors, but with timeout-related errors it almost always does), and it hangs when trying to read all remaining rows on Close() (with the same stack as in @encryptio description), but, as far as I can see mysql does not returns any additional rows, and driver waits almost forever (until mysql drops connection on wait_timeout). We use MySQL 5.6.21-70.1-log (Percona Server 70.1) if this helps. |
@ikkeps ok. Apparently I don't have an instance supporting |
This may fix the hung https://gist.github.com/methane/6c6e2db8464c0579a7af |
Please notify us in case this problem still persists. |
At first look this does not fix the bug (at least for max_statement_time case, which, still, could be unrelated to original bug described). I started wireshark and, as far as I can see, driver hungs when error happens after some rows successfully readed, and driver tries to Close() rows (and tries to read rows until EOF). Still almost the same stack as in original error.
|
I (and Travic-CI) doesn't have MySQL 5.7 installed. |
I'm running many concurrent transactions (half read, half write, often with statements which cause large write locks on rows/index pages) with tx_isolation=serializable, and rerunning transactions where any statement returns a
*MySQLError
withNumber
==ER_LOCK_DEADLOCK
(1213). There are very often deadlock errors (~0.015 retries per completed transaction), which is expected under the kind of queries and load I'm putting on the system.This code runs fine for a while, but occasionally (more often in heavy load), a client connection will get locked up waiting for an extra packet at the end of a result list that the server doesn't think needs to be sent. Unfortunately, this happens a hell of a lot under heavy load (once every couple of minutes at 100 transactions per second sustained.)
The client is blocked (forever) on:
But the server thinks the connection is idle:
It's not specific to the scheduler transactions either; it appears to happen randomly, roughly in proportion to what I expect the number of rows locked in each transaction is.
I suspect either the mysqld implementation or the go mysql driver have a bug somewhere in their protocol handling, but I don't know where to go from here.
Using
go-sql-driver/mysql
as of9543750 Merge pull request #280 from hyandell/master
. I can make this happen reliably with MySQL 5.5.37 and 10.0.14-MariaDB, and with Go 1.2.2 and Go 1.3.3, and have not found a version of anything on which it does not lock up.The text was updated successfully, but these errors were encountered: