Skip to content

Queries are not sent until previous queries are done, nullifying all advantages of using Node #660

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jusou opened this issue Oct 19, 2014 · 12 comments

Comments

@jusou
Copy link

jusou commented Oct 19, 2014

This module waits to send a query until the results of previous queries arrive from PostgreSQL.

This is an issue that nullifies all advantages of using Node: the whole point of using Node is that you can easily make requests in parallel and receive answers asynchronously.

With this implementation, doing N queries results in N database roundtrips, even if the queries are issued in parallel, while of course there should be more than 1 roundtrip only if subsequent queries need the output of previous ones.

For instance, if you try to insert a million objects in parallel with this library in its current state using separate queries, you will create a million of unnecessary database roundtrips.

You can see this easily with the following code:

var pg = require('postgres-bluebird');
var Promise = require('bluebird');

pg.connectAsync('/run/postgresql').spread(function(c, done) {
    return Promise.all([
        c.queryAsync("SELECT pg_sleep(10)"),
        c.queryAsync("SELECT 31337")
    ]).finally(function() {done(c);});
}).done();

Just run it through strace to get this result:

$ strace -t -s 65535 node test.js 2>&1|grep -E 'pg_sleep|31337'
[...]
22:43:22 write(10, "Q\0\0\0\30SELECT pg_sleep(10)\0", 25) = 25
<-- 10 seconds gap
22:43:32 read(10, "T\0\0\0!\0\1pg_sleep\0\0\0\0\0\0\0\0\0\10\346\0\4\377\377\377\377\0\0D\0\0\0\n\0\1\0\0\0\0C\0\0\0\rSELECT 1\0Z\0\0\0\5I", 65536) = 65
22:43:32 write(10, "Q\0\0\0\21SELECT 31337\0", 18) = 18
22:43:32 read(10, "T\0\0\0!\0\1?column?\0\0\0\0\0\0\0\0\0\0\27\0\4\377\377\377\377\0\0D\0\0\0\17\0\1\0\0\0\00531337C\0\0\0\rSELECT 1\0Z\0\0\0\5I", 65536) = 70

In case it's not obvious, this is the expected output:

$ strace -t -s 65535 node test.js 2>&1|grep -E 'pg_sleep|31337'
[...]
22:43:22 write(10, "Q\0\0\0\30SELECT pg_sleep(10)\0", 25) = 25
22:43:22 write(10, "Q\0\0\0\21SELECT 31337\0", 18) = 18
<-- 10 seconds gap AFTER sending both queries
22:43:32 read(10, "T\0\0\0!\0\1pg_sleep\0\0\0\0\0\0\0\0\0\10\346\0\4\377\377\377\377\0\0D\0\0\0\n\0\1\0\0\0\0C\0\0\0\rSELECT 1\0Z\0\0\0\5I", 65536) = 65
22:43:32 read(10, "T\0\0\0!\0\1?column?\0\0\0\0\0\0\0\0\0\0\27\0\4\377\377\377\377\0\0D\0\0\0\17\0\1\0\0\0\00531337C\0\0\0\rSELECT 1\0Z\0\0\0\5I", 65536) = 70
@jusou jusou changed the title Catastrophically, queries are not sent until previous queries are done Queries are not sent until previous queries are done, nullifying all advantages of using Node Oct 19, 2014
@brianc
Copy link
Owner

brianc commented Oct 19, 2014

The PostgreSQL server itself only supports one active query per client connection. Libpq will barf if you try to send a query while one is already pending. node-postgres does its best to make it easy to not send two at the same time by internally buffering one query until the next is processed.

You're wrong in assuming this nullifies the advantage of node.js. If you want to issue multiple queries at the same time, you need to connect multiple clients and send queries across each client. By default the pool holds 20 clients, but you can modify this. Note: PostgreSQL backend only supports about 500 connected clients at a time - depends on the amount of RAM your server has. The best way to do this is to use the built in connection pool. We handle millions of requests per day with node.js & PostgreSQL - certainly the advantages are not nullified for us.

Furthermore, if all you're doing in your script is 1 million inserts then you're not really taking full advantage of node.js. What node allows is to perform non-blocking i/o which allows you to do other actions while your queries are being processed by the PostgreSQL backend. So, issue a query, handle a web request, read a file, all delegated out to the operating system and only calling back to node when the operation is finished.

@brianc brianc closed this as completed Oct 19, 2014
@jusou
Copy link
Author

jusou commented Oct 19, 2014

This is most likely wrong.

The PostgreSQL documentation clearly states this.

<<
A simple query cycle is initiated by the frontend sending a Query message to the backend. The message includes an SQL command (or commands) expressed as a text string. The backend then sends one or more response messages depending on the contents of the query command string, and finally a ReadyForQuery response message. ReadyForQuery informs the frontend that it can safely send a new command. (It is not actually necessary for the frontend to wait for ReadyForQuery before issuing another command, but the frontend must then take responsibility for figuring out what happens if the earlier command fails and already-issued later commands succeed.)

It's most likely actually the case, because otherwise PostgreSQL would be horribly designed, since it would make no sense whatsoever to read additional commands while processing one for the sake of rejecting them.

In fact, they even explicitly say this:
<<
During the processing of a query, the frontend might request cancellation of the query. The cancel request is not sent directly on the open connection to the backend for reasons of implementation efficiency: we don't want to have the backend constantly checking for new input from the frontend during query processing.

Thus, the server doesn't seem to even look at the socket while processing queries, which means it can't even notice whether you waited for ReadyForQuery or not to send further queries.

You should probably reopen the issue unless this is somehow incorrect.

@brianc
Copy link
Owner

brianc commented Oct 20, 2014

I know for a fact that libpq prevents you from sending more than 1 command at a time.

Run this code:

var Client = require('pg-native')

var client = new Client()
client.connectSync()
client.query('pg_sleep(30)', function(err) {
  console.log('done 1')
})
client.query('pg_sleep(30)', function(err) {
  if(err) throw err;
  console.log('done 2')
  client.end()
})

It seems like there is a way to dump multiple command statements into the socket without waiting for results, but I don't know if it works with parameterized queries, and more importantly - what does this actually help with? You're still going to be sending the same amount of data across the socket, and having no way to properly deal with errors in earlier queries. In your example in particular you're still going to be waiting for the first query to finish before you get the second query's results.

I think there is room to do something special with an explicit API for when you'd want to blast queries at the server and ignore any errors encountered earlier, but it seems very edge case & since libpq doesn't support it I'm not going to add it to node-postgres directly. I think it could be done with a separate module in the way https://github.com/brianc/node-pg-query-stream is done, but it ain't gonna happen in node-postgres itself.

I stand by what I said earlier in that sending a query & waiting for the response (in a non-blocking way) in no way nullifies any advantage of node.js. Matter-of-fact the PostgreSQL team made sure you can only send 1 query at a time in a fully non-blocking way even in their official C client.

I'd encourage you to build a separate module you can pass into client.query and it will take over and pipeline a bunch of queries through the connection if you want this functionality.

@brianc
Copy link
Owner

brianc commented Oct 20, 2014

Also - you can actually chain multiple statements together in a single command & send it & get the results:

var statements = 'INSERT INTO blah(id) VALUES(1); INSERT INTO something(age) VALUES(100);'
client.query(statements)

Still not generally a good idea due to error handling.

@jusou
Copy link
Author

jusou commented Oct 20, 2014

I know for a fact that libpq prevents you from sending more than 1 command at a time.

Maybe libpq does (just like your implementation right now), but I believe its highly likely you can just pass multiple commands on the socket, since this library does not use libpq.

And this library does not use libpq, and the main reason to not use libpq is to implement proper asynchronous semantics...

The docs say it's possible, and it makes no sense for it to be impossible.

I think there is room to do something special with an explicit API for when you'd want to blast queries at the server and ignore any errors encountered earlier

There's no need to ignore errors.

Just queue all queries that you sent and didn't get an answer for, and pop them and call its callback every time the server returns success or an error.

what does this actually help with? You're still going to be sending the same amount of data across the socket

It helps with latency, obviously.

Let's say it takes 1 ms to reach the database and go back and you issue 10000 inserts or updates in parallel using async.parallel, Promise.all or similar, then it will take 10 seconds just to go forward or back from the network, and the 10 seconds wait is completely unnecessary.

Without the unnecessary roundtrips, the whole 10000 inserts would take much less than a second, and your implementation just made such an application unusably slow.

This is ESSENTIAL for applications that do things like that, and useful for everything (you might shave 10-100 ms or more per request on a typical web application if your database is on another machine and the application is properly written).

@brianc
Copy link
Owner

brianc commented Oct 20, 2014

I'm happy to review any pull requests you'd like to submit. 😄

@jusou
Copy link
Author

jusou commented Oct 20, 2014

Actually, it looks like a 2-line patch that removes the call to Query.submit from Client._pulseQueryQueue and adds a call to it in Client.query might be all that is needed.

EDIT: actually, that probably breaks copyFrom, more sophistication is needed for that

@brianc
Copy link
Owner

brianc commented Oct 20, 2014

How would you support this:

pg.connect(function(err, client, done) {
  client.query('BEGIN', function(err) {
    if(err) return done(err)
    client.query('INSERT INTO account(user_id, amount) VALUES(1, 100)', function(err) {
      if(err) return done(err)
      client.query('INSERT INTO account(user_id, amount) VALUES(2, -100)', function(err) {
        if(err) return done(err)
        client.query('COMMIT', done)
      })
    })
  })
})

@jusou
Copy link
Author

jusou commented Oct 20, 2014

In your example queries are sent sequentially (i.e. you are waiting for the callback to be called to issue the next query), so there is no difference.

Anyway, see the pull request at #662

@StarpTech
Copy link

Can we reopen it since Postgres 14 implemented pipeline mode? https://www.postgresql.org/docs/14/libpq-pipeline-mode.html

@brianc
Copy link
Owner

brianc commented Oct 28, 2021

yeah maybe we can open a new issue instead? This one is older than the hills. It's a good feature though I'd like to add it soon.

@abenhamdine
Copy link
Contributor

Hi Brianc, I opened #2646

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants