Queries are not sent until previous queries are done, nullifying all advantages of using Node #660

jusou · 2014-10-19T20:53:35Z

This module waits to send a query until the results of previous queries arrive from PostgreSQL.

This is an issue that nullifies all advantages of using Node: the whole point of using Node is that you can easily make requests in parallel and receive answers asynchronously.

With this implementation, doing N queries results in N database roundtrips, even if the queries are issued in parallel, while of course there should be more than 1 roundtrip only if subsequent queries need the output of previous ones.

For instance, if you try to insert a million objects in parallel with this library in its current state using separate queries, you will create a million of unnecessary database roundtrips.

You can see this easily with the following code:

var pg = require('postgres-bluebird');
var Promise = require('bluebird');

pg.connectAsync('/run/postgresql').spread(function(c, done) {
    return Promise.all([
        c.queryAsync("SELECT pg_sleep(10)"),
        c.queryAsync("SELECT 31337")
    ]).finally(function() {done(c);});
}).done();

Just run it through strace to get this result:

$ strace -t -s 65535 node test.js 2>&1|grep -E 'pg_sleep|31337'
[...]
22:43:22 write(10, "Q\0\0\0\30SELECT pg_sleep(10)\0", 25) = 25
<-- 10 seconds gap
22:43:32 read(10, "T\0\0\0!\0\1pg_sleep\0\0\0\0\0\0\0\0\0\10\346\0\4\377\377\377\377\0\0D\0\0\0\n\0\1\0\0\0\0C\0\0\0\rSELECT 1\0Z\0\0\0\5I", 65536) = 65
22:43:32 write(10, "Q\0\0\0\21SELECT 31337\0", 18) = 18
22:43:32 read(10, "T\0\0\0!\0\1?column?\0\0\0\0\0\0\0\0\0\0\27\0\4\377\377\377\377\0\0D\0\0\0\17\0\1\0\0\0\00531337C\0\0\0\rSELECT 1\0Z\0\0\0\5I", 65536) = 70

In case it's not obvious, this is the expected output:

$ strace -t -s 65535 node test.js 2>&1|grep -E 'pg_sleep|31337'
[...]
22:43:22 write(10, "Q\0\0\0\30SELECT pg_sleep(10)\0", 25) = 25
22:43:22 write(10, "Q\0\0\0\21SELECT 31337\0", 18) = 18
<-- 10 seconds gap AFTER sending both queries
22:43:32 read(10, "T\0\0\0!\0\1pg_sleep\0\0\0\0\0\0\0\0\0\10\346\0\4\377\377\377\377\0\0D\0\0\0\n\0\1\0\0\0\0C\0\0\0\rSELECT 1\0Z\0\0\0\5I", 65536) = 65
22:43:32 read(10, "T\0\0\0!\0\1?column?\0\0\0\0\0\0\0\0\0\0\27\0\4\377\377\377\377\0\0D\0\0\0\17\0\1\0\0\0\00531337C\0\0\0\rSELECT 1\0Z\0\0\0\5I", 65536) = 70

The text was updated successfully, but these errors were encountered:

brianc · 2014-10-19T22:40:54Z

The PostgreSQL server itself only supports one active query per client connection. Libpq will barf if you try to send a query while one is already pending. node-postgres does its best to make it easy to not send two at the same time by internally buffering one query until the next is processed.

You're wrong in assuming this nullifies the advantage of node.js. If you want to issue multiple queries at the same time, you need to connect multiple clients and send queries across each client. By default the pool holds 20 clients, but you can modify this. Note: PostgreSQL backend only supports about 500 connected clients at a time - depends on the amount of RAM your server has. The best way to do this is to use the built in connection pool. We handle millions of requests per day with node.js & PostgreSQL - certainly the advantages are not nullified for us.

Furthermore, if all you're doing in your script is 1 million inserts then you're not really taking full advantage of node.js. What node allows is to perform non-blocking i/o which allows you to do other actions while your queries are being processed by the PostgreSQL backend. So, issue a query, handle a web request, read a file, all delegated out to the operating system and only calling back to node when the operation is finished.

jusou · 2014-10-19T23:37:20Z

This is most likely wrong.

The PostgreSQL documentation clearly states this.

<<
A simple query cycle is initiated by the frontend sending a Query message to the backend. The message includes an SQL command (or commands) expressed as a text string. The backend then sends one or more response messages depending on the contents of the query command string, and finally a ReadyForQuery response message. ReadyForQuery informs the frontend that it can safely send a new command. (It is not actually necessary for the frontend to wait for ReadyForQuery before issuing another command, but the frontend must then take responsibility for figuring out what happens if the earlier command fails and already-issued later commands succeed.)

It's most likely actually the case, because otherwise PostgreSQL would be horribly designed, since it would make no sense whatsoever to read additional commands while processing one for the sake of rejecting them.

In fact, they even explicitly say this:
<<
During the processing of a query, the frontend might request cancellation of the query. The cancel request is not sent directly on the open connection to the backend for reasons of implementation efficiency: we don't want to have the backend constantly checking for new input from the frontend during query processing.

Thus, the server doesn't seem to even look at the socket while processing queries, which means it can't even notice whether you waited for ReadyForQuery or not to send further queries.

You should probably reopen the issue unless this is somehow incorrect.

brianc · 2014-10-20T00:31:24Z

I know for a fact that libpq prevents you from sending more than 1 command at a time.

Run this code:

var Client = require('pg-native')

var client = new Client()
client.connectSync()
client.query('pg_sleep(30)', function(err) {
  console.log('done 1')
})
client.query('pg_sleep(30)', function(err) {
  if(err) throw err;
  console.log('done 2')
  client.end()
})

It seems like there is a way to dump multiple command statements into the socket without waiting for results, but I don't know if it works with parameterized queries, and more importantly - what does this actually help with? You're still going to be sending the same amount of data across the socket, and having no way to properly deal with errors in earlier queries. In your example in particular you're still going to be waiting for the first query to finish before you get the second query's results.

I think there is room to do something special with an explicit API for when you'd want to blast queries at the server and ignore any errors encountered earlier, but it seems very edge case & since libpq doesn't support it I'm not going to add it to node-postgres directly. I think it could be done with a separate module in the way https://github.com/brianc/node-pg-query-stream is done, but it ain't gonna happen in node-postgres itself.

I stand by what I said earlier in that sending a query & waiting for the response (in a non-blocking way) in no way nullifies any advantage of node.js. Matter-of-fact the PostgreSQL team made sure you can only send 1 query at a time in a fully non-blocking way even in their official C client.

I'd encourage you to build a separate module you can pass into client.query and it will take over and pipeline a bunch of queries through the connection if you want this functionality.

brianc · 2014-10-20T00:35:33Z

Also - you can actually chain multiple statements together in a single command & send it & get the results:

var statements = 'INSERT INTO blah(id) VALUES(1); INSERT INTO something(age) VALUES(100);'
client.query(statements)

Still not generally a good idea due to error handling.

jusou · 2014-10-20T01:01:37Z

I know for a fact that libpq prevents you from sending more than 1 command at a time.

Maybe libpq does (just like your implementation right now), but I believe its highly likely you can just pass multiple commands on the socket, since this library does not use libpq.

And this library does not use libpq, and the main reason to not use libpq is to implement proper asynchronous semantics...

The docs say it's possible, and it makes no sense for it to be impossible.

I think there is room to do something special with an explicit API for when you'd want to blast queries at the server and ignore any errors encountered earlier

There's no need to ignore errors.

Just queue all queries that you sent and didn't get an answer for, and pop them and call its callback every time the server returns success or an error.

what does this actually help with? You're still going to be sending the same amount of data across the socket

It helps with latency, obviously.

Let's say it takes 1 ms to reach the database and go back and you issue 10000 inserts or updates in parallel using async.parallel, Promise.all or similar, then it will take 10 seconds just to go forward or back from the network, and the 10 seconds wait is completely unnecessary.

Without the unnecessary roundtrips, the whole 10000 inserts would take much less than a second, and your implementation just made such an application unusably slow.

This is ESSENTIAL for applications that do things like that, and useful for everything (you might shave 10-100 ms or more per request on a typical web application if your database is on another machine and the application is properly written).

brianc · 2014-10-20T01:03:49Z

I'm happy to review any pull requests you'd like to submit. 😄

jusou · 2014-10-20T01:26:09Z

Actually, it looks like a 2-line patch that removes the call to Query.submit from Client._pulseQueryQueue and adds a call to it in Client.query might be all that is needed.

EDIT: actually, that probably breaks copyFrom, more sophistication is needed for that

brianc · 2014-10-20T02:25:33Z

How would you support this:

pg.connect(function(err, client, done) {
  client.query('BEGIN', function(err) {
    if(err) return done(err)
    client.query('INSERT INTO account(user_id, amount) VALUES(1, 100)', function(err) {
      if(err) return done(err)
      client.query('INSERT INTO account(user_id, amount) VALUES(2, -100)', function(err) {
        if(err) return done(err)
        client.query('COMMIT', done)
      })
    })
  })
})

jusou · 2014-10-20T03:05:28Z

In your example queries are sent sequentially (i.e. you are waiting for the callback to be called to issue the next query), so there is no difference.

Anyway, see the pull request at #662

StarpTech · 2021-10-28T10:56:49Z

Can we reopen it since Postgres 14 implemented pipeline mode? https://www.postgresql.org/docs/14/libpq-pipeline-mode.html

brianc · 2021-10-28T17:21:34Z

yeah maybe we can open a new issue instead? This one is older than the hills. It's a good feature though I'd like to add it soon.

abenhamdine · 2021-10-31T17:27:57Z

Hi Brianc, I opened #2646

jusou changed the title ~~Catastrophically, queries are not sent until previous queries are done~~ Queries are not sent until previous queries are done, nullifying all advantages of using Node Oct 19, 2014

brianc closed this as completed Oct 19, 2014

abenhamdine mentioned this issue Oct 31, 2021

implement pipelining mode allowed by libpq 14 #2646

Open

jasperkuperus mentioned this issue Jun 30, 2022

Update records while streaming a big table #2768

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Queries are not sent until previous queries are done, nullifying all advantages of using Node #660

Queries are not sent until previous queries are done, nullifying all advantages of using Node #660

jusou commented Oct 19, 2014

brianc commented Oct 19, 2014

jusou commented Oct 19, 2014

brianc commented Oct 20, 2014

brianc commented Oct 20, 2014

jusou commented Oct 20, 2014

brianc commented Oct 20, 2014

jusou commented Oct 20, 2014

brianc commented Oct 20, 2014

jusou commented Oct 20, 2014

StarpTech commented Oct 28, 2021

brianc commented Oct 28, 2021

abenhamdine commented Oct 31, 2021

Queries are not sent until previous queries are done, nullifying all advantages of using Node #660

Queries are not sent until previous queries are done, nullifying all advantages of using Node #660

Comments

jusou commented Oct 19, 2014

brianc commented Oct 19, 2014

jusou commented Oct 19, 2014

brianc commented Oct 20, 2014

brianc commented Oct 20, 2014

jusou commented Oct 20, 2014

brianc commented Oct 20, 2014

jusou commented Oct 20, 2014

brianc commented Oct 20, 2014

jusou commented Oct 20, 2014

StarpTech commented Oct 28, 2021

brianc commented Oct 28, 2021

abenhamdine commented Oct 31, 2021