-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
implement pipelining mode allowed by libpq 14 #2646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Any update on this? |
I did some tests locally with a fork, built upon #662 and noticed some interesting perf gains in my uses cases (between 5% and 20%). |
Explicit pipelining would be interesting.
Or const [{err: errB, rows: rowsA} {err: errA, rows: rowsB}] = await pool.pipeline([{text, values}, {text, values}]) |
I don't get your point. If pipelining is possible, why would you want to get it explicitely and would not want not always use it ? I can understand to put it behind a flag (eg |
With transactions it will get executed to a single client, so that will work as desired. But when using a pool, pipeline would get distributed among all the clients. In some cases you might want to execute it on the same client instance so other queries won't be scheduled in between. But maybe that's not worth it to implement. |
EDIT :
Yes it's part of my thoughts (also with mixing DML/DDL queries) |
I'm not sure how the implementation in node would look like but this should not be possible using libpq. I'm investigating this feature, and (I might be wrong) but what I found is this is just allowing batch dispatch transparently from client side. The pipeline essentially makes the network file descriptor non-blocking and allow the client submit multiple requests in a non-blocking way. All the requests are going through the same SOCK_FD and the client needs now to monitor the associated SOCK_FD (select, poll, epoll, io_uring, kqueue, etc...). Not sure if the complexity it adds to the client side pays off. There are cases it can improve performance reducing/saving network trip time because the lib allows the client to push the requests to the lib and then call sync (sort of flush) and it will send all to the server at the same time, having only 1 RTT instead of multiples. But the SOCK_FD used will be the same. Another alternative is to call sync after every request (query). The calls will be asynchronous and won't block but they will again go through the same SOCK_FD and the response, even though asynchronous, will be sequential. Let's say we have 5 queries A, B, C, D and E. The 1st takes 1 second, the 2nd 30 seconds and all the subsequent ones take 1 second. We expect to receive all the responses, except the query B within 1~2 seconds time, but this won't happen because they will be returned sequentially. It will be A=1s, B=31s, C=32s, etc... I just started investigating this feature on libpq and I might have missed something or not got to the nice features yet, but as I said, ATM the client implementation costs is quite high for the benefits it gets. It needs to handle the SOCK_FD but except from the RTT savings, I can't see much more. |
Yes pipelining consists essentially in reusing the same socket without waiting for previous requests reponse.
Saving network trips can provide significant gain perfs, see comments and benckmark #2706 (nodejs implementation but the saving of network trips would be the same I assume) |
I have no doubt that when in high latency network the gains could be significant but was skeptical on how much gain we could get otherwise. I'm still concerned of the complexity it may add to some client implementations but after reading the points (and benchmark) in the cases you referred I'm starting to rethink it. Thanks for sharing! |
Is the postgresql server executing the queries in parallel or one at a time when using a pipeline? When not using a connection pool, but a single connection, pipelining would be really useful. The best way would be to use it like in redis, but the response should be similar to
|
With pipelining you can reduce the idle time between queries on a single connection, by buffering the next query ahead of time. The server will not execute multiple queries in parallel on the same connection, but whenever it finishes with one query, it won't have to wait a full network roundtrip for the next. This becomes significant if the query execution times are comparable to the network latency. |
Would be nice to run multiple queries in parallel with just one connection. Forking in the backend probably would be faster than client opening multiple connections. |
pipelining has nothing to do with query execution concurrency/parallelism. With or without pipelining mode, the PostgreSQL server is executing the queries sequentially (while using parallelism capabilities if enabled), pipelining just allows both sides of the connection to work concurrently when possible, and to minimize round-trip time. |
@abenhamdine I said would be nice. And is only possible with pipelining. |
Any news? |
No unfortunately, I don't have time to finish the PR (and I didn't even look at the native driver) |
For everyone want test the @abenhamdine PR this is a simple snippet for override pg and implement the #2706 changes. Side note: I noticed a considerable improvement in performance. import * as pg from 'pg';
pg.Connection.prototype.submittedNamedStatements = {};
pg.Client.prototype.sentQueryQueue = [];
pg.Client.prototype.pipelining = true;
pg.Client.prototype.handshakeDone = false;
const original_handleReadyForQuery = pg.Client.prototype._handleReadyForQuery;
pg.Client.prototype._handleReadyForQuery = function () {
this.handshakeDone = true;
original_handleReadyForQuery.call(this);
};
pg.Client.prototype._pulseQueryQueue = function () {
if (this.pipelining) {
if (!this.handshakeDone) {
return
}
while (!this.blocked || (this.activeQuery === null && this.sentQueryQueue.length === 0)) {
const query = this.queryQueue.shift()
if (!query) break
const queryError = query.submit(this.connection)
if (queryError) {
process.nextTick(() => {
this.activeQuery.handleError(queryError, this.connection)
this.readyForQuery = true
this._pulseQueryQueue()
})
}
this.blocked = query.blocking
this.sentQueryQueue.push(query)
if (query.name) {
this.connection.submittedNamedStatements[query.name] = query.text
}
}
}
if (this.readyForQuery === true) {
this.activeQuery = this.pipelining ? this.sentQueryQueue.shift() : this.queryQueue.shift()
if (this.activeQuery) {
this.readyForQuery = false
this.hasExecuted = true
if (!this.pipelining) {
const queryError = this.activeQuery.submit(this.connection)
if (queryError) {
process.nextTick(() => {
this.activeQuery.handleError(queryError, this.connection)
this.readyForQuery = true
this._pulseQueryQueue()
})
}
}
} else if (this.hasExecuted) {
this.activeQuery = null
this.emit('drain')
}
}
}; |
@brianc I’d like to resume this work. Would you accept a PR? |
@brianc seems not very active, perhaps better to draw the attention of @charmander (see #2646 (comment)) |
I'm here! Yeah I'm very much open to PRs all the time. Sometimes a bit delayed because life is...life...but I'm still here. :) |
libpq shipped with Postgres 14 now allows using pipelining mode
https://www.postgresql.org/docs/14/libpq-pipeline-mode.html
as stated by these docs
So I don't know exactly when v3 extended query protocol has appeared (at least 2010 I guess), but it's there since a lot of time, so probably all supported version of pg server (9.6 and above) are able to support pipelining mode.
related commit in posgres repository is there https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=acb7e4eb6b
For the record, a previous PR has been proposed at #662 but it was before before libpq supports it so the PR has been closed because it would have created a discrepancy between native bindings (which use libpq) and the js client modified by the PR.
IMHO this mode should be opt-in, at least for current major version :
Pipeline mode also generally consumes more memory on both the client and server
)edit : I made an (incomplete) attempt in this PR : #2706
The text was updated successfully, but these errors were encountered: