Skip to content

Feature: Memory limits #2336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pauldraper opened this issue Sep 7, 2020 · 6 comments
Closed

Feature: Memory limits #2336

pauldraper opened this issue Sep 7, 2020 · 6 comments

Comments

@pauldraper
Copy link

pauldraper commented Sep 7, 2020

I'd like a large result to produce an Error (which can be handled), rather than crashing the entire process with OutOfMemory.

This is similar in nature to limit I/O for HTTP requests:

const express = require('express')
const bodyParser = require('body-parser')

const app = express()
app.use(bodyParser.json({ limit: '5mb' }))

I imagine an option to Connection/Pool like maxResultSize. When the result exceeds the configured number of bytes, an Error is thrown. (I understand this limit would be subject to the choice of binary and text formats...this isn't super precise, just a way to gracefully handle unexpectedly large results.)

This would significantly improve the stability of multi-operation processes using this library.

@sehrope
Copy link
Contributor

sehrope commented Sep 7, 2020

Are you concerned with a single row / field being too large or an entire result set (returned as an array) being too large?

If it's the total result set, have you considered using a cursor? That would let you operate on the row in chunks without needing to allocate memory for the entire result set. If you have commands that could return an arbitrary / huge number of rows, it's the way to go to ensure you do not blow up the node process memory usage.

@pauldraper
Copy link
Author

pauldraper commented Sep 8, 2020

Both. Yes, I could use a cursor.

But in this case I'm not interested in the complexity of streaming the results, or even getting large results set to work. Success for me would simply be large results sets not crashing the process.


It's the same rationale as limiting HTTP message size. Yes, you could stream the HTTP body and carry that streaming through the rest of the app. However that is complex. Libraries can be configured to just cancel/error the operation for bodies over a certain size.

Likewise, I have a server making PostgreSQL queries. If the response is ever too large, the entire server crashes. Instead, I'd like to cancel/error the specific offending operation.

@brianc
Copy link
Owner

brianc commented Sep 8, 2020

If it's the total result set, have you considered using a cursor? That would let you operate on the row in chunks without needing to allocate memory for the entire result set. If you have commands that could return an arbitrary / huge number of rows, it's the way to go to ensure you do not blow up the node process memory usage.

Yup, 100% this is the way to handle this! The postgres protocol supports this, and node is obvs build w/ streaming in mind. This library already exists as well. We use pg and regularly stream 500-600 megabytes of results down to clients w/o issue using cursors.

As far a single row taking up more memory than your server...you should fix your data in that case. 🤷

One major difference between a db client & a web server & why body parser accepts a limit is a web server needs to protect you from bad actors sending you gigabytes in a post request and DDoSing you. Generally, you control both your web server & database client, so you are just DDoSing yourself in this case. Even if the postgres client threw an error (and forcibly disconnected) the database server has already gone through the work of shoving hundreds of megabytes into the network socket.

@brianc brianc closed this as completed Sep 8, 2020
@boromisp
Copy link
Contributor

boromisp commented Sep 8, 2020

If you find a use case for "I want to close the connection, if the database returned more than N bytes", it might be possible to implement something like this outside the library:

this.emit('message', msg)

You would need to attach a message event handler to the Connection, sum up the msg.length for each message type you are interested in and reset the counter on readyForQuery.

If you use the pool, you could set this up in the pool's connect handler.

@pauldraper
Copy link
Author

pauldraper commented Sep 8, 2020

Yup, 100% this is the way to handle this! The postgres protocol supports this, and node is obvs build w/ streaming in mind.

That is the ideal way. But now everything for the entire operation has to support streaming, e.g. streaming XML or JSON.

One major difference between a db client & a web server & why body parser accepts a limit is a web server needs to protect you from bad actors sending you gigabytes in a post request and DDoSing you. Generally, you control both your web server & database client, so you are just DDoSing yourself in this case. Even if the postgres client threw an error (and forcibly disconnected) the database server has already gone through the work of shoving hundreds of megabytes into the network socket.

There are a lot of assumptions going on here.

  • Assumption 1: An HTTP client/server are controlled by different parties.
  • Assumption 2: A PostgreSQL client/server are never controlled by different parties.
  • Assumption 3: PostgreSQL network I/O is always more pressing concern than application memory.

These assumptions do not apply universally, nor is there is an fundamental reason why they should. There's more in the world than a database-backed webserver.

It's very common, in fact, best practice, for I/O applications of all kinds to limit incoming payloads to prevent crashes. PostgreSQL itself (despite -- as you say -- usually being controlled by the same party as the client) has limits for incoming data.

I see no reason why PostgreSQL client (particularly one with a runtime unable to respond to malloc() errors) can't protect itself in the same way.

This is change that is remarkably easy to add, but would be somewhat hairy to implement via a plugin.

@brianc
Copy link
Owner

brianc commented Sep 8, 2020

Please feel free to submit a pull request with proper test coverage & I'll be happy to take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants