Skip to content

implement pipelining mode allowed by libpq 14 #2646

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
abenhamdine opened this issue Oct 31, 2021 · 20 comments
Open

implement pipelining mode allowed by libpq 14 #2646

abenhamdine opened this issue Oct 31, 2021 · 20 comments

Comments

@abenhamdine
Copy link
Contributor

abenhamdine commented Oct 31, 2021

libpq shipped with Postgres 14 now allows using pipelining mode
https://www.postgresql.org/docs/14/libpq-pipeline-mode.html

as stated by these docs

While the pipeline API was introduced in PostgreSQL 14, it is a client-side feature which doesn't require special server support and works on any server that supports the v3 extended query protocol.

So I don't know exactly when v3 extended query protocol has appeared (at least 2010 I guess), but it's there since a lot of time, so probably all supported version of pg server (9.6 and above) are able to support pipelining mode.

related commit in posgres repository is there https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=acb7e4eb6b

For the record, a previous PR has been proposed at #662 but it was before before libpq supports it so the PR has been closed because it would have created a discrepancy between native bindings (which use libpq) and the js client modified by the PR.

IMHO this mode should be opt-in, at least for current major version :

  • because it's not so clear (to me at least) how some workflows are handled (what if you send a DDL query adding a column and then immediatly a DML query using the new column ?)
  • to avoid subtle breaking changes in users usage (especially for old version of pg)
  • for memory consumption reason (from the pg docs : Pipeline mode also generally consumes more memory on both the client and server)

edit : I made an (incomplete) attempt in this PR : #2706

@jadbox
Copy link
Contributor

jadbox commented Feb 1, 2022

Any update on this?

@abenhamdine
Copy link
Contributor Author

abenhamdine commented Feb 1, 2022

Any update on this?

I did some tests locally with a fork, built upon #662 and noticed some interesting perf gains in my uses cases (between 5% and 20%).
I will send a PR as a POC.
However it adress only the js driver, not the native one.

@marcbachmann
Copy link

marcbachmann commented Feb 8, 2022

Explicit pipelining would be interesting.
E.g. ioredis supports client.pipeline([['get', 'foo'], ['get', 'bar']]).exec().

pg could do the same to ensure they are sent on the same connection without any other slow query interfering.
e.g. const [[errA, resA], [errB, resB]] = await pool.pipeline([{text, values}, {text, values}]).

Or

const [{err: errB, rows: rowsA} {err: errA, rows: rowsB}] = await pool.pipeline([{text, values}, {text, values}])

@abenhamdine
Copy link
Contributor Author

Explicit pipelining would be interesting.

I don't get your point. If pipelining is possible, why would you want to get it explicitely and would not want not always use it ?

I can understand to put it behind a flag (eg --pipelining at connection level) until we are sure it's safe to use it for every use case, but I fail to see why it's should be explicit in the api.

@marcbachmann
Copy link

With transactions it will get executed to a single client, so that will work as desired.

But when using a pool, pipeline would get distributed among all the clients. In some cases you might want to execute it on the same client instance so other queries won't be scheduled in between. But maybe that's not worth it to implement.

@abenhamdine
Copy link
Contributor Author

abenhamdine commented Feb 16, 2022

But when using a pool, pipeline would get distributed among all the clients

Indeed, but it will be the case with or without pipelining, thus why would it be an issue ? Or perhaps I miss something in your explanation ?

EDIT :
oh sorry, it looks like I missed the important part of your post :

so other queries won't be scheduled in between

Yes it's part of my thoughts (also with mixing DML/DDL queries)
Perhaps indeed it should be allowed to optin (or optout ?) at the pool and/or client level

@elchinoo
Copy link

But when using a pool, pipeline would get distributed among all the clients. In some cases you might want to execute it on the same client instance so other queries won't be scheduled in between. But maybe that's not worth it to implement.

I'm not sure how the implementation in node would look like but this should not be possible using libpq. I'm investigating this feature, and (I might be wrong) but what I found is this is just allowing batch dispatch transparently from client side. The pipeline essentially makes the network file descriptor non-blocking and allow the client submit multiple requests in a non-blocking way. All the requests are going through the same SOCK_FD and the client needs now to monitor the associated SOCK_FD (select, poll, epoll, io_uring, kqueue, etc...).

Not sure if the complexity it adds to the client side pays off. There are cases it can improve performance reducing/saving network trip time because the lib allows the client to push the requests to the lib and then call sync (sort of flush) and it will send all to the server at the same time, having only 1 RTT instead of multiples. But the SOCK_FD used will be the same.

Another alternative is to call sync after every request (query). The calls will be asynchronous and won't block but they will again go through the same SOCK_FD and the response, even though asynchronous, will be sequential. Let's say we have 5 queries A, B, C, D and E. The 1st takes 1 second, the 2nd 30 seconds and all the subsequent ones take 1 second. We expect to receive all the responses, except the query B within 1~2 seconds time, but this won't happen because they will be returned sequentially. It will be A=1s, B=31s, C=32s, etc...

I just started investigating this feature on libpq and I might have missed something or not got to the nice features yet, but as I said, ATM the client implementation costs is quite high for the benefits it gets. It needs to handle the SOCK_FD but except from the RTT savings, I can't see much more.

@abenhamdine
Copy link
Contributor Author

abenhamdine commented Feb 28, 2022

I'm not sure how the implementation in node would look like but this should not be possible using libpq. I'm investigating this feature, and (I might be wrong) but what I found is this is just allowing batch dispatch transparently from client side. The pipeline essentially makes the network file descriptor non-blocking and allow the client submit multiple requests in a non-blocking way. All the requests are going through the same SOCK_FD and the client needs now to monitor the associated SOCK_FD (select, poll, epoll, io_uring, kqueue, etc...).

Yes pipelining consists essentially in reusing the same socket without waiting for previous requests reponse.

Not sure if the complexity it adds to the client side pays off. There are cases it can improve performance reducing/saving network trip time because the lib allows the client to push the requests to the lib and then call sync (sort of flush) and it will send all to the server at the same time, having only 1 RTT instead of multiples. But the SOCK_FD used will be the same.

Saving network trips can provide significant gain perfs, see comments and benckmark #2706 (nodejs implementation but the saving of network trips would be the same I assume)

@elchinoo
Copy link

elchinoo commented Feb 28, 2022

Saving network tripes can provide significant gain perfs, see comments and benckmark #2706 (nodejs implementation but the saving of network trips would be the same I assume)

I have no doubt that when in high latency network the gains could be significant but was skeptical on how much gain we could get otherwise. I'm still concerned of the complexity it may add to some client implementations but after reading the points (and benchmark) in the cases you referred I'm starting to rethink it. Thanks for sharing!

@tapz
Copy link

tapz commented Oct 16, 2023

Is the postgresql server executing the queries in parallel or one at a time when using a pipeline? When not using a connection pool, but a single connection, pipelining would be really useful. The best way would be to use it like in redis, but the response should be similar to Promise.allSettled.

const [{status, value}, {status, value}] = await client.pipeline().query(sql1).query(sql2).exec();

@boromisp
Copy link
Contributor

With pipelining you can reduce the idle time between queries on a single connection, by buffering the next query ahead of time.

The server will not execute multiple queries in parallel on the same connection, but whenever it finishes with one query, it won't have to wait a full network roundtrip for the next. This becomes significant if the query execution times are comparable to the network latency.

@tapz
Copy link

tapz commented Oct 16, 2023

Would be nice to run multiple queries in parallel with just one connection. Forking in the backend probably would be faster than client opening multiple connections.

@abenhamdine
Copy link
Contributor Author

Would be nice to run multiple queries in parallel with just one connection. Forking in the backend probably would be faster than client opening multiple connections.

pipelining has nothing to do with query execution concurrency/parallelism. With or without pipelining mode, the PostgreSQL server is executing the queries sequentially (while using parallelism capabilities if enabled), pipelining just allows both sides of the connection to work concurrently when possible, and to minimize round-trip time.

@tapz
Copy link

tapz commented Oct 16, 2023

@abenhamdine I said would be nice. And is only possible with pipelining.

@cesco69
Copy link
Contributor

cesco69 commented Jan 23, 2024

Any news?

@abenhamdine
Copy link
Contributor Author

No unfortunately, I don't have time to finish the PR (and I didn't even look at the native driver)

@cesco69 cesco69 mentioned this issue Apr 10, 2024
@cesco69
Copy link
Contributor

cesco69 commented Jun 10, 2024

For everyone want test the @abenhamdine PR this is a simple snippet for override pg and implement the #2706 changes.

Side note: I noticed a considerable improvement in performance.

import * as pg from 'pg';

pg.Connection.prototype.submittedNamedStatements = {};

pg.Client.prototype.sentQueryQueue = [];
pg.Client.prototype.pipelining = true;
pg.Client.prototype.handshakeDone = false;

const original_handleReadyForQuery = pg.Client.prototype._handleReadyForQuery;
pg.Client.prototype._handleReadyForQuery = function () {
    this.handshakeDone = true;
    original_handleReadyForQuery.call(this);
};

pg.Client.prototype._pulseQueryQueue = function () {
    if (this.pipelining) {
        if (!this.handshakeDone) {
            return
        }

        while (!this.blocked || (this.activeQuery === null && this.sentQueryQueue.length === 0)) {
            const query = this.queryQueue.shift()
            if (!query) break
            const queryError = query.submit(this.connection)
            if (queryError) {
                process.nextTick(() => {
                    this.activeQuery.handleError(queryError, this.connection)
                    this.readyForQuery = true
                    this._pulseQueryQueue()
                })
            }
            this.blocked = query.blocking
            this.sentQueryQueue.push(query)
            if (query.name) {
                this.connection.submittedNamedStatements[query.name] = query.text
            }
        }
    }
   
    if (this.readyForQuery === true) {
        this.activeQuery = this.pipelining ? this.sentQueryQueue.shift() : this.queryQueue.shift()
          
        if (this.activeQuery) {
            this.readyForQuery = false
            this.hasExecuted = true
            
            if (!this.pipelining) {
                const queryError = this.activeQuery.submit(this.connection)
                if (queryError) {
                    process.nextTick(() => {
                        this.activeQuery.handleError(queryError, this.connection)
                        this.readyForQuery = true
                        this._pulseQueryQueue()
                    })
                }
            }
        } else if (this.hasExecuted) {
            this.activeQuery = null
            this.emit('drain')
        }
    }
};

@mcollina
Copy link

@brianc I’d like to resume this work. Would you accept a PR?

@cesco69
Copy link
Contributor

cesco69 commented Jan 13, 2025

@mcollina

@brianc seems not very active, perhaps better to draw the attention of @charmander (see #2646 (comment))

@brianc
Copy link
Owner

brianc commented Jan 13, 2025

I'm here! Yeah I'm very much open to PRs all the time. Sometimes a bit delayed because life is...life...but I'm still here. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants