-
Notifications
You must be signed in to change notification settings - Fork 417
advice: best bulk upsert method that still allows to track # of affected rows? #755
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This would be the best alternative approach. |
@fantix does issuing two statements back to back at the same psql prompt somewhat emulate what happens on the wire with the new executemany()?
if so, do you think it would be possible to expose these status messages? |
A simpler method than a temp table is an array parameter. INSERT INTO "table_name" AS __destination_row (
id,
other_column
)
SELECT unnest($1::int[], $2::text[])
ON CONFLICT (id)
DO UPDATE SET
id = excluded.id,
other_column = excluded.other_column
WHERE
(__destination_row.id IS DISTINCT FROM excluded.id)
OR
(__destination_row.other_column IS DISTINCT FROM excluded.other_column)
; IDK how it compares to a temp table, but I use this approach frequently and find the the performance to be quite good. |
I think psql parses the SQL commands (at least, tokenizes them) and sends them separately. Regardless, the simple query protocol (which asyncpg is using to send multiple statements without parsing them) does expose multiple result sets. https://www.postgresql.org/docs/13/protocol-flow.html#id-1.10.5.7.4 |
Exposing the results of Why not use another temp table to store the number of affected rows? |
@elprans: what exactly do you mean by "store the number of affected rows"? How can that be achieved? |
Something like:
Then, conn.executemany(
'''
WITH
ins AS (<your-insert-statement> RETURNING *)
INSERT INTO merge_ops(rowcount) (SELECT count(*) FROM ins)
'''
) Then, number_inserted = conn.fetchval('SELECT sum(rowcount) FROM merge_ops') |
Just wanted to link the other related thread. You mention that you could allow |
I've been relying on the newest implementation of executemany() to perform bulk upserts, but it has the shortcoming that it will not allow to easily determine the number of affected rows by parsing the statusmsg.
The number of effectively upserted rows can easily be less than the number of rows I attempt to upsert, since I qualify my ON CONFLICT clause with a further WHERE clause specifying that the update should only happen if the new and excluded tuples are distinct.
(regular Postgres would allow for a much terser syntax, but this is the only syntax that is accepted by CockroachDB)
Suppose that at times knowing the exact number of effectively upserted rows is more crucial than the bulk performance, and yet I would prefer not to go to the extreme of upserting one row at a time, what would be the best compromise?
Should I rely on a temporary table and then upserting into the physical tables from that temporary table?
Should I instead use a transaction with several individual upserts of values once again provided by the client?
Are there other approaches I should explore?
The text was updated successfully, but these errors were encountered: