Skip to content

Attempt to handle a stream being silently killed in the background #2121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 14 additions & 3 deletions packages/pg-pool/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -116,11 +116,22 @@ class Pool extends EventEmitter {
const pendingItem = this._pendingQueue.shift()
if (this._idle.length) {
const idleItem = this._idle.pop()
clearTimeout(idleItem.timeoutId)
const client = idleItem.client
const idleListener = idleItem.idleListener
clearTimeout(idleItem.timeoutId)

// TODO(bmc): we need a better state represetation on the client itself
// to indicate if it's connecting, idle, running a query, closed, etc...
// as right now this hacky fix will only work for the JS client
// since the native client doesn't _have_ a connection property

return this._acquireClient(client, pendingItem, idleListener, false)
// remove this client, it's died in the background - this can happen
// in aws lambdas - they go idle and the streams are closed without an error
if (client.connection && client.connection.stream.readyState !== 'open') {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm seeing some mention online that stream.readyState is deprecated/unsupported. It might be safer to do this instead:

(client.connection.stream.readable && client.connection.stream.writable)

When experimenting w/ things locally, I'm finding that these properties change as soon as you call stream.end().

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah those properties don't exist in all supported versions of node that pg supports. stream.readyState does exist in all supported versions.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When experimenting w/ things locally, I'm finding that these properties change as soon as you call stream.end().

you should check if readyState changes as well (I think it does) - reason is any official patch is gonna need to work in node 8, 10, 12, and 13. So if its "officially deprecated" in node but still works, I'd use it (as it probably wasn't deprecated in older versions of node). Typically I do that, write a bunch of tests for the feature, and then if a test breaks on an edge release of node I either decide to drop old version support at that point or make a compatibilty branch in the code that will handle both old and new versions of node.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure -- I'll give that a try after the initial experiment. Speaking of which: you can find the experiment within the knex pooling here:

https://github.com/briandamaged/knex/tree/experiment/aws-disconnections

And, here's the link to the specific commit:

briandamaged/knex@1490611

So, I think this roughly aligns w/ your implementation. (Aside from the readyState piece)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yah totally actually I don't wanna burden you w/ backwards compatiblity - if you find a way to make it work in lambda w/ a version of node, i'll do my best to back port it and "production-ify" it for this library. Mostly it's just hard for me to isolate & test since I don't have a lambda nor much experience w/ them so me setting all that testing infrastrucutre up is a huge time sync. But if you figure out the root cause, I'll help any way I can getting a fix out

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah -- yeah, I currently don't have a way to recreate the issue, either. I just know that Knex's current implementation relies upon Events to detect the closed connections. So, this just seems like a possible way to work around that issue.

Hopefully we'll hear some good news from this experiment! If so, then we can figure out how to formulate the concepts into something that is production-worthy. (One thought: we might even need to swap out implementations depending upon Node version. Ex: use readable / writable if the Node version supports those properties; otherwise, fall back to readyState)

this._remove(client)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the current logic: if it notices that the next-available client is already dead, then it removes it from the pool. Afterwards, it instantiates a new client to replace it.

Would there be any advantage to looping on the client eviction step? In pseudo-code:

while(this._idle.length) {
  const client = this._idle.pop();

  // ... Other important details I'm ignoring for the moment...

  if(isDead(client)) {
    this.remove(client);
  } else {
    return client
  }
}

if(!this._isFull()) {
  return this.newClient(...);
}

throw new Error('unexpected condition');

This seems like it helps in 2 ways:

  1. It clears out the dead clients more quickly, and
  2. It avoids the need to wait for this.newClient(..) when there is already another client available.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably, yeah - good suggestions!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi guys, I want to test this branch, but npm says its missing the version... tried to fork it, put the version (https://github.com/592da/node-postgres) and then compiling failed : Cannot find module 'pg'

@brianc

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I suspect that's because this is a mono-repo that is built via Lerna . You might need to build it locally first and then link to the build artifacts. (@brianc might know another approach. I haven't actually worked with this codebase before).

I don't remember the exact build steps for Lerna off the top of my head, but I think it's something like:

  1. Install Lerna
  2. Navigate to the root of the repo
  3. Run the command: lerna bootstrap

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi guys, I want to test this branch, but npm says its missing the version... tried to fork it, put the version (https://github.com/592da/node-postgres) and then compiling failed : Cannot find module 'pg'

if you wanna test it it's probably easiest to just apply the diff of the 1 file i changed inside your node_modules/pg-pool folder to index.js (or just copy/paste the changes).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for your reply @briandamaged, unfortunately, it didn't work, maybe because its also imported from typeorm

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@brianc , having the same issue...

2020-02-26T21:38:37.302Z	da5bbf66-06b7-4c69-9c85-ab84b1b777a1	INFO	Error: Connection terminated unexpectedly
    at Connection.<anonymous> (/var/task/src/rest/webpack:/node_modules/pg/lib/client.js:255:1)
    at Object.onceWrapper (events.js:312:28)
    at Connection.emit (events.js:223:5)
    at Connection.EventEmitter.emit (domain.js:475:20)
    at Socket.<anonymous> (/var/task/src/rest/webpack:/node_modules/pg/lib/connection.js:139:1)
    at Socket.emit (events.js:228:7)
    at Socket.EventEmitter.emit (domain.js:475:20)
    at endReadableNT (_stream_readable.js:1185:12)
    at processTicksAndRejections (internal/process/task_queues.js:81:21)```

} else {
const idleListener = idleItem.idleListener
return this._acquireClient(client, pendingItem, idleListener, false)
}
}
if (!this._isFull()) {
return this.newClient(pendingItem)
Expand Down
31 changes: 31 additions & 0 deletions packages/pg-pool/test/background-socket-terminiation.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
'use strict'
const net = require('net')
const co = require('co')
const expect = require('expect.js')

const describe = require('mocha').describe
const it = require('mocha').it

const Pool = require('../')

describe('a socket disconnecting in the background', () => {
it('should leave the pool in a usable state', async () => {
const pool = new Pool({ max: 1 })
// just run any arbitrary query
const client = await pool.connect()
await client.query('SELECT NOW()')
// return the client
client.release()

// now kill the socket in the background
client.connection.stream.end()

// now try to query again, it should work
await pool.query('SELECT NOW()')
await pool.query('SELECT NOW()')
await pool.query('SELECT NOW()')
await pool.query('SELECT NOW()')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you should also add something like:

const client2 = await pool.connect();
try {
  // Verify that `client` was removed from the pool and replaced w/ a new client
  expect(client2).to.not.equal(client);
} finally {
  await client2.release();
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Otherwise, it's possible that client just magically fixed itself through some other means. So, adding this check helps to rule out that possibility)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yah that's not a bad idea! I'd like to hold off on merging this until I get some confirmation its actually fixing things in aws lambda. It also unfortunately doesn't work w/ the native bindings yet so I'll need to somehow figure that out.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

As I mentioned in this thread, I suspect that both "pg" and "knex" were encountering the same issue within their respective pool implementations. So, we'll most likely need to implement a similar fix on the "knex" side.

await pool.end()
})
})