Update records while streaming a big table #2768

jasperkuperus · 2022-06-30T19:57:14Z

I'm streaming through a big table using:

const stream = client.query(
  new QueryStream(`
    SELECT * FROM my_table
  `),
);

In the stream.on('data', ...), I execute a query to update that same row. I hereby hope to stream through the big table efficiently without using too much RAM.

However, when I add some logging, I clearly see that first the complete stream is read before any of the updates are executed. And when I explicitly pause the stream to allow the updates to happen, it deadlocks.

Reading this issue (#660), it looks like a query is only executed after it's predecessor is done. If that applies to my case, this explains what's going on. Is this assumption right?

Now, I've tried to work around this by using a second connection to execute the updates. That works fine. But, in my particular scenario, I have to do 2 sequential passes over the table and the second pass should see the updates that happened during the first pass. All of it should happen in a transaction, any failure in the first or second pass should rollback the complete operation.

Any suggestions on how to best approach this?

The text was updated successfully, but these errors were encountered:

richardabendroth · 2022-07-20T17:26:53Z

By design, PostgreSQL is not handling multiple queries simultaneously for the same connection. Also, transactions cannot be shared across multiple connections. I'm not too sure what you need to achieve there. You'd generally want to look at connection pools if you need query parallelism in your application. But I have understood, that in your case this is not an option. It reads like you are trying to distribute a part of your application across your node application (data manipulation and validation?) and the database (data integrity management?). This requires you to transmit large amounts of data back (and forth?) for a single "function" of your overall application. If I might suggest: have a look at stored procedures or INSTEAD OF triggers in PostgreSQL. You might be able to offload most of the functionality into the database and reduce the overhead significantly. Instead of

BEGIN; 
...
UPDATE ... RETURNING *;
-- stuff in node
UPDATE ... RETURNING *;
-- stuff in node
ROLLBACK/COMMIT;

you would be looking at something like

client.query('SELECT some_stored_procedure($1)', [someArrayOrJsonObject]);

that can also return a table if needed or

client.query('INSERT INTO some_view_with_insteadof_triggers VALUES ... RETURNING *', ...);

that would defer the operation to a trigger procedure updating/checking other tables.

Having written that: you will need to take a look into PL/PGSQL and how triggers, trigger procedures, stored procedures(functions), and views with instead of triggers work.

As a general rule of thumb: If your transaction overhead (the data that needs to be transmitted back and forth for a single transaction) seems a bit too far off, the design for where you do what within your application's "function" needs some afterthought.

jasperkuperus · 2022-07-22T06:59:48Z

@richardabendroth Thanks for your reply. That confirms my suspicions indeed. The work I had to do required some heavy business logic calculations in node. So, writing a stored procedure could have worked, but would have caused my business logic to duplicate and potentially introduce implementation differences between the two. In my specific case, I've loosened my requirement of doing the 2 passes in 1 transaction. By doing that, using 2 clients, one for streaming the rows, one for updating the rows in a transaction, worked out for me.

jasperkuperus closed this as completed Jul 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update records while streaming a big table #2768

Update records while streaming a big table #2768

jasperkuperus commented Jun 30, 2022

richardabendroth commented Jul 20, 2022

jasperkuperus commented Jul 22, 2022

Update records while streaming a big table #2768

Update records while streaming a big table #2768

Comments

jasperkuperus commented Jun 30, 2022

richardabendroth commented Jul 20, 2022

jasperkuperus commented Jul 22, 2022