Skip to content

client in pool timed out, causing pool to fail for all future queries #2243

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
isaacl opened this issue Jun 20, 2020 · 6 comments
Open

client in pool timed out, causing pool to fail for all future queries #2243

isaacl opened this issue Jun 20, 2020 · 6 comments

Comments

@isaacl
Copy link

isaacl commented Jun 20, 2020

Hello! I'm using a pg-pool in a lambda environment. Last night, the pool hit an error:

2020-06-19T17:28:50.154-04:00 | START
2020-06-19T17:28:50.257-04:00 | ERROR	Invoke Error
"Error: Connection terminated due to connection timeout",
        "    at Timeout.<anonymous> (/var/task/node_modules/pg/lib/client.js:103:26)",

My pool is a static global to allow lambda to reuse across invocations. I guess pg wasn't smart enough to check idle timeout before assigning a client, and in this case it was already dead – is there a way to flush the pool on each lambda invocation startup?

Anyway, somehow this first timeout put the pool in a bad state, and every call to the pool (which are made using the .query helper) instantly failed with a timeout.

  | 2020-06-19T17:29:17.492-04:00 | START
  | 2020-06-19T17:29:17.496-04:00 | ERROR	Invoke Error
Error: Connection terminated due to connection timeout

My config (pg 8.0.3, xray 3.0.1)

import pg from 'pg';
import xrayPostgres from 'aws-xray-sdk-postgres';
const { Pool } = xrayPostgres(pg)
const pool = new Pool({
      connectionTimeoutMillis: 5000,
      idleTimeoutMillis: 60000,
      ssl: ...
});
@sehrope
Copy link
Contributor

sehrope commented Jun 20, 2020

The issue is that when the lambda freezes and then thaws, any TCP connections are broken but not necessarily notified that they're broken. This can lead to all sorts of issues so the simplest advice is not to use any long lived pooling in lambda and keep things stateless.

See this thread for more on this topic: #2112

There's likely some improvements that could be done to the pool / client code to better handle these types of errors but it's not going to be anything that will make those connections usable. At best it will end up attempting to use them, hanging or erroring out, and then getting you a new connection after some timeout.

If you still want to stick with the pool interface to handle the connection management (as it's convenient for cleaning up resources), check out the "ZeroPool" mentioned here: #1938

@isaacl
Copy link
Author

isaacl commented Jun 20, 2020

@sehrope yeah my issue is likely related to the freeze/thaw as well. But in my case, the client timed out from the internal pg timer, but then the timeout wasn't renewed for subsequent queries, causing the timeout promise to instantly resolve. So I think there's a bug here that's very specific to pg-pool.

@demian85
Copy link

demian85 commented Jul 2, 2020

The issue is that when the lambda freezes and then thaws, any TCP connections are broken but not necessarily notified that they're broken. This can lead to all sorts of issues so the simplest advice is not to use any long lived pooling in lambda and keep things stateless.

See this thread for more on this topic: #2112

There's likely some improvements that could be done to the pool / client code to better handle these types of errors but it's not going to be anything that will make those connections usable. At best it will end up attempting to use them, hanging or erroring out, and then getting you a new connection after some timeout.

If you still want to stick with the pool interface to handle the connection management (as it's convenient for cleaning up resources), check out the "ZeroPool" mentioned here: #1938

How is even possible to "break" a TCP connection and node not getting notified about it? Does it make sense? I'm trying to find a better way of handling this situations in lambda functions, but I'm not sure if someone is working on this or if I should patch it myself by instantiating a pool and closing it after lambda finishes execution. This is a dirty solution to me.

@sehrope
Copy link
Contributor

sehrope commented Jul 2, 2020

How is even possible to "break" a TCP connection and node not getting notified about it? Does it make sense?

When an AWS Lambda is frozen, it's like pausing a virtual machine. When it thaws, it's like resuming a virtual machine. However there's no event to notify the process that it was frozen or thawed. There's also no guarantee it will ever be thawed as an entirely new process could be spawned and the old one discarded.

Memory and open local file descriptors should be fine on resume but anything connected to a remote system may not work as the remote end of the connection or anything in between (e.g. network hardware like a NAT) could have closed the connection.

Simplest approach is to make your Lambda's stateless and have resources only last as long as individual requests. Having a pool stay alive as long as some request is active would be doable, but it'd also be much more complicated and could potentially hang forever if your Lambda ever times out.

@demian85
Copy link

demian85 commented Jul 2, 2020

Thanks for your clarification. It makes sense now, BUT... What if I tell Postgres to not close idle connections for a very long time, will it stay alive when the lambda thaws? Is that even possible using parameter groups?

@sehrope
Copy link
Contributor

sehrope commented Jul 2, 2020

That's probably a bad idea anyway (each IDLE connections take up some resources so you can't have infinite of them). Even if you could, there's no way to guarantee it will survive a freeze / thaw. Better to avoid it entirely.

If you're concerned about slow connection times then consider adding a dedicated connection pooler like pgbouncer and have your Lambda target the pooler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants