-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Update records while streaming a big table #2768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
By design, PostgreSQL is not handling multiple queries simultaneously for the same connection. Also, transactions cannot be shared across multiple connections. I'm not too sure what you need to achieve there. You'd generally want to look at connection pools if you need query parallelism in your application. But I have understood, that in your case this is not an option. It reads like you are trying to distribute a part of your application across your node application (data manipulation and validation?) and the database (data integrity management?). This requires you to transmit large amounts of data back (and forth?) for a single "function" of your overall application. If I might suggest: have a look at stored procedures or BEGIN;
...
UPDATE ... RETURNING *;
-- stuff in node
UPDATE ... RETURNING *;
-- stuff in node
ROLLBACK/COMMIT; you would be looking at something like client.query('SELECT some_stored_procedure($1)', [someArrayOrJsonObject]); that can also return a table if needed or client.query('INSERT INTO some_view_with_insteadof_triggers VALUES ... RETURNING *', ...); that would defer the operation to a trigger procedure updating/checking other tables. Having written that: you will need to take a look into PL/PGSQL and how triggers, trigger procedures, stored procedures(functions), and views with instead of triggers work. As a general rule of thumb: If your transaction overhead (the data that needs to be transmitted back and forth for a single transaction) seems a bit too far off, the design for where you do what within your application's "function" needs some afterthought. |
@richardabendroth Thanks for your reply. That confirms my suspicions indeed. The work I had to do required some heavy business logic calculations in node. So, writing a stored procedure could have worked, but would have caused my business logic to duplicate and potentially introduce implementation differences between the two. In my specific case, I've loosened my requirement of doing the 2 passes in 1 transaction. By doing that, using 2 clients, one for streaming the rows, one for updating the rows in a transaction, worked out for me. |
I'm streaming through a big table using:
In the
stream.on('data', ...)
, I execute a query to update that same row. I hereby hope to stream through the big table efficiently without using too much RAM.However, when I add some logging, I clearly see that first the complete stream is read before any of the updates are executed. And when I explicitly pause the stream to allow the updates to happen, it deadlocks.
Reading this issue (#660), it looks like a query is only executed after it's predecessor is done. If that applies to my case, this explains what's going on. Is this assumption right?
Now, I've tried to work around this by using a second connection to execute the updates. That works fine. But, in my particular scenario, I have to do 2 sequential passes over the table and the second pass should see the updates that happened during the first pass. All of it should happen in a transaction, any failure in the first or second pass should rollback the complete operation.
Any suggestions on how to best approach this?
The text was updated successfully, but these errors were encountered: