Skip to content

Subsequent call to PingContext after context timeout errors with bad connection. #858

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
arvenil opened this issue Sep 21, 2018 · 6 comments · Fixed by #862
Closed

Subsequent call to PingContext after context timeout errors with bad connection. #858

arvenil opened this issue Sep 21, 2018 · 6 comments · Fixed by #862

Comments

@arvenil
Copy link
Contributor

arvenil commented Sep 21, 2018

Issue description

After first call to PingContext quits with context deadline exceeded the subsequent call will return bad connection.

With plenty of debug lines I've tracked down the issue a bit. Firstly, timing is really important here and e.g. with time.Millisecond I hit the error but with time.Nanosecond each call correctly returns context deadline exceeded. So please be aware that you may need to adjust those values to be able to reproduce the issue.

When I used time.Nanosecond the context deadline exceeded is returned by sql package before connection is taken from the pool https://github.com/golang/go/blob/ce58a39fca067a19c505220c0c907ccf32793427/src/database/sql/sql.go#L1086-L1091
However with time.Millisecond, it has enough time to pass those checks and timeout is reached in this library

mysql/connection.go

Lines 603 to 606 in 99ff426

select {
default:
case <-ctx.Done():
return ctx.Err()

As of now, the context deadline exceeded error is returned and connection is returned to the pool.
Second call to PingContext now picks up the previously used connection and tries to send data but fails and returns
return errBadConnNoWrite

I've tracked it a bit, but I'm still not sure about the reason. I was able to "fix" the issue by adding mc.cleanup() before returning context deadline exceeded.

// Ping implements driver.Pinger interface
func (mc *mysqlConn) Ping(ctx context.Context) (err error) {
	if mc.closed.IsSet() {
		errLog.Print(ErrInvalidConn)
		return driver.ErrBadConn
	}

	if err = mc.watchCancel(ctx); err != nil {
		mc.cleanup() // This line fixes the issue but it leaves bad connection in the pool
		return
	}
	defer mc.finish()

	if err = mc.writeCommandPacket(comPing); err != nil {
		return
	}

	return mc.readResultOK()
}

However this leads to two calls to DB.conn() instead of one as this bad connection was left in the pool https://github.com/golang/go/blob/ce58a39fca067a19c505220c0c907ccf32793427/src/database/sql/sql.go#L730-L731

Another hack I've tried was to return driver.ErrBadConn instead context deadline exceeded. This takes out the connection out of the pool (ctx timeout doesn't remove connection from the pool) but we loose information about the real error.

And of course this raises a question - if context timeouts should we assume we can't re-use that connection anymore?

Example code

package main

import (
	"context"
	"database/sql"
	"log"
	"time"

	_ "github.com/go-sql-driver/mysql"
)

func main() {
	db, err := sql.Open("mysql", "root@/")
	if err != nil {
		panic(err)
	}
	db.SetMaxOpenConns(1)
	db.SetMaxIdleConns(1)

	for {
		ctx := context.Background()
		ctx, cancel := context.WithTimeout(ctx, time.Millisecond)
		err := db.PingContext(ctx)
		if err != nil {
			log.Println(err)
		}
		cancel()
		time.Sleep(time.Second)
	}
}

Error log

# go run ./main.go
2018/09/21 16:08:15 context deadline exceeded
2018/09/21 16:08:16 bad connection
2018/09/21 16:08:17 context deadline exceeded
2018/09/21 16:08:18 bad connection
2018/09/21 16:08:19 context deadline exceeded
2018/09/21 16:08:20 bad connection
@arvenil
Copy link
Contributor Author

arvenil commented Sep 21, 2018

I've only tested this on PingContext but other *Context calls may be affected as well.

arvenil added a commit to percona/mysqld_exporter that referenced this issue Sep 25, 2018
arvenil added a commit to percona/mysqld_exporter that referenced this issue Sep 25, 2018
@methane
Copy link
Member

methane commented Sep 26, 2018

And of course this raises a question - if context timeouts should we assume we can't re-use that connection anymore?

If timeout happened before sending any packets, it's safe.
If timeout happened after sending some packet and before it's response, the connection is dirty and never reused.

@shogo82148
Copy link
Contributor

Can I have your Go version?

I implemented SessionResetter #779 ,
so database/sql will not return a bad connection to the connection pool.
It works from Go 1.10.

@arvenil
Copy link
Contributor Author

arvenil commented Sep 27, 2018

@shogo82148

$ go version
go version go1.10.3 darwin/amd64

@arvenil
Copy link
Contributor Author

arvenil commented Sep 27, 2018

@methane @shogo82148 Could you try to run example code and post results here? Would be really useful to me if you could confirm the behavior.

package main

import (
	"context"
	"database/sql"
	"log"
	"time"

	_ "github.com/go-sql-driver/mysql"
)

func main() {
	db, err := sql.Open("mysql", "root@/")
	if err != nil {
		panic(err)
	}
	db.SetMaxOpenConns(1)
	db.SetMaxIdleConns(1)

	for {
		ctx := context.Background()
		ctx, cancel := context.WithTimeout(ctx, time.Millisecond)
		err := db.PingContext(ctx)
		if err != nil {
			log.Println(err)
		}
		cancel()
		time.Sleep(time.Second)
	}
}

@methane
Copy link
Member

methane commented Sep 27, 2018

@shogo82148
On first Ping, timeout is happened here.

mysql/connection.go

Lines 602 to 606 in 99ff426

mc.watching = true
select {
default:
case <-ctx.Done():
return ctx.Err()

Note that mc.watching = true.

On next Ping, mc.watchCancel doesn't return err. But it closes connection. See here:

mysql/connection.go

Lines 591 to 597 in 99ff426

func (mc *mysqlConn) watchCancel(ctx context.Context) error {
if mc.watching {
// Reach here if canceled,
// so the connection is already invalid
mc.cleanup()
return nil
}

After mc.watchCancel() returned, Ping calls mc.writeCommandPacket(comPing).
In that function, mc.netConn.Write() fails with ErrClosedConn.

Since the error happened before sending any data, mc.writePacket() returns errBadConnNoWrite.


How do you think about moving mc.watching = true?

--- a/connection.go
+++ b/connection.go
@@ -599,7 +607,6 @@ func (mc *mysqlConn) watchCancel(ctx context.Context) error {
                return nil
        }

-       mc.watching = true
        select {
        default:
        case <-ctx.Done():
@@ -609,6 +616,7 @@ func (mc *mysqlConn) watchCancel(ctx context.Context) error {
                return nil
        }

+       mc.watching = true
        mc.watcher <- ctx

        return nil

methane added a commit to methane/mysql that referenced this issue Sep 28, 2018
methane added a commit to methane/mysql that referenced this issue Sep 28, 2018
methane added a commit to methane/mysql that referenced this issue Sep 28, 2018
Don't break the connection when cancelled context is passed.

Fix go-sql-driver#858
methane added a commit to methane/mysql that referenced this issue Sep 28, 2018
Don't break the connection when cancelled context is passed.

Fix go-sql-driver#858
methane added a commit that referenced this issue Oct 1, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants