-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Problems with bytea performance #1286
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Please do, many people would want to know 😉 Your own test is the best direction so far, add logs within the connection and the UPDATE On second thought though, since use of the |
This caught my interest, so I did some quick local testing, and I can confirm the following results... Reading a single
That's a 20x difference - huge!!!! |
Thanks, I'll start there and see what I can find. |
Just saw your edit. Glad you can reproduce. I'll keep digging but let me know if you find something on your end. |
I did some profiling and a huge amount of time (~10 seconds of the 13 for my test case) is spent in More specifically, it looks like Thoughts? |
I'm not sure yet. All I can see is that all the time is being spent inside the on https://github.com/brianc/node-postgres/blob/master/lib/connection.js#L128 Which line in there results in the longest delay - harder to tell. |
Yeah, most of the time is being spent on this line: https://github.com/brianc/node-postgres/blob/master/lib/connection.js#L129 From my total 4s it eats about 3.2s of time, that's definitely bad!!! |
Yep, that's exactly what I'm seeing. I think I know what the problem is and I'm trying to rewrite |
@brianc It seems that this line kills the performance when dealing with large It slows down the library 20 times (from my tests), compared to the Native Bindings. And it may be worthwhile revisiting the entire on- Unfortunately, I cannot be more specific at present, those things require a bit of a closer look. |
@vitaly-t, I have a fix for the I've gone from ~13 seconds to ~450ms, matching the native time! |
@mramato Wow, magical! I want to see that! Will you do a PR? ;) I was, in the meantime, trying to ask a question here: http://stackoverflow.com/questions/43900530/slow-buffer-concat @mramato If you are doing a PR, keep an eye on this question on StackOverflow, it might offer even a better idea, as there is already one answer there ;) |
`Reader.prototype.addChunk` was calling `Buffer.concat` constantly, which increased garbage collection and just all-around killed performance. The exact implications of this is documented in brianc/node-postgres#1286, which has a test case for showing how performance is affected. Rather than concatenating buffers to the new buffer size constantly, this change uses a growth strategy that doubles the size of the buffer each time and tracks the functional length in a separate `chunkLength` variable. This significantly reduces the amount of allocation and provides a 25x performance in my test cases, the larger the amount of data the query is returning, the greater improvement of performance. Since this uses a doubling buffer, it was important to avoid growing forever, so I also added a reclaimation strategy which reduces the size of the buffer wever time more than half of the data has been read.
@vitaly-t see brianc/node-packet-reader#3, should be easy enough for you to patch your local copy and verify my claims. Not sure of the full extent of this improvement, did I just make every node-based Postgres app in the world significantly faster? 😄 |
@mramato Well done! I can confirm that the change in brianc/node-packet-reader#3 in fact improves the performance really well, close to what we get with the Native Bindings. I cannot however confirm that the change guarantees the data integrity. New tests towards that are a must-have for such a change. |
Thanks. Can you be more specific regarding exactly what you would like to see that isn't covered by the existing tests? Are you talking about adding tests in |
I'm saying that we need to make sure that |
OK, thanks. I'll do a pass myself and see if there is anything obvious I can add. Thanks for the help on this by the way. |
@vitaly-t I used istanbul to verify coverage and the new buffer compaction was not tested at all (but the rest of the changes were adequately covered. I added a new test to verify the behavior is expected and the file is back to 100% coverage. Hopefully that PR is now good to go, but please let me know if there's anything else you need me to do. |
@mramato Good by me! But I'm not the one to approve the PR 😉 |
Thanks for your help y'all! Much appreciated! |
Isn't a similar bug potentially affecting the |
@pdkovacs The packet-reader package that was patched reads entire messages; parsing individual fields wasn’t the problem. The fix applies to all types. |
Ah, I see, thanks! |
I originally opened this as part of a
knex
issue: knex/knex#2052 but was directed here instead.I have a simple query:
glb is a
bytea
column and the value I'm retrieving is28630077
bytes (~27MB) (models contains a single row in this example). The query takes13305 ms
to run and the Node process (not the DB process) maxes out the CPU while the query is running). If I query for theassets_id
column instead of theglb
column, it only takes2 ms
.Running the same query with the same data from the psql command line completes almost immediately:
I also tested the same query in
pg-native
and it completed in~450ms
, but usingpg-native
isn't an option for me at this time (though I might have to re-evaluate that depending on where this issue goes).Here's the table definition for completeness.
Finally, I thought maybe it was a performance issue in the type parser, but all of the time is taken up by the query and then the typeParser completes almost instantly at the end.
Am I doing something wrong? Or is there a performance issue with
bytea
issues? I'd be happy to debug this further myself if someone can point me in the correct direction. Thanks in advance.The text was updated successfully, but these errors were encountered: