-
Notifications
You must be signed in to change notification settings - Fork 910
S3 getObject combined with AsyncResponseTransformer.toBytes() copies too much data #3193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
Comments
chibenwa
added a commit
to chibenwa/james-project
that referenced
this issue
May 16, 2022
CF aws/aws-sdk-java-v2#3193 `AsyncResponseTransformer.toBytes()` relies internally on an unsized ByteArrayOutputStream (that thus will expend many times) and the result will be copied when calling `toByteArray()` and another defensive copy is carried other when transforming the future. We thus implement a response transformer variation: - Remove needless defensive copies inside `ByteArrayAsyncResponseTransformer`. The byte array is passed to the caller, who then becomes responsible of it, and nobody else references the old byte array once the publisher completes. This can be an instant win coming at a very low price. - Rely on `GetResponse::contentLength` to size a byte array and copy incoming buffers to it in place. This requires knowledge about response type... Thus this might be hardly doable in a generic fashion. On a typical IMAP benchmark S3 getObject toBytes transformation takes 1.66% of overall memory allocation.
chibenwa
added a commit
to chibenwa/james-project
that referenced
this issue
May 16, 2022
CF aws/aws-sdk-java-v2#3193 `AsyncResponseTransformer.toBytes()` relies internally on an unsized ByteArrayOutputStream (that thus will expend many times) and the result will be copied when calling `toByteArray()` and another defensive copy is carried other when transforming the future. We thus implement a response transformer variation: - Remove needless defensive copies inside `ByteArrayAsyncResponseTransformer`. The byte array is passed to the caller, who then becomes responsible of it, and nobody else references the old byte array once the publisher completes. This can be an instant win coming at a very low price. - Rely on `GetResponse::contentLength` to size a byte array and copy incoming buffers to it in place. This requires knowledge about response type... Thus this might be hardly doable in a generic fashion. On a typical IMAP benchmark S3 getObject toBytes transformation takes 1.66% of overall memory allocation.
@chibenwa thank you for reaching out, added to our backlog. Since it's an optimization I'm changing to feature request. |
chibenwa
added a commit
to chibenwa/james-project
that referenced
this issue
May 18, 2022
CF aws/aws-sdk-java-v2#3193 `AsyncResponseTransformer.toBytes()` relies internally on an unsized ByteArrayOutputStream (that thus will expend many times) and the result will be copied when calling `toByteArray()` and another defensive copy is carried other when transforming the future. We thus implement a response transformer variation: - Remove needless defensive copies inside `ByteArrayAsyncResponseTransformer`. The byte array is passed to the caller, who then becomes responsible of it, and nobody else references the old byte array once the publisher completes. This can be an instant win coming at a very low price. - Rely on `GetResponse::contentLength` to size a byte array and copy incoming buffers to it in place. This requires knowledge about response type... Thus this might be hardly doable in a generic fashion. On a typical IMAP benchmark S3 getObject toBytes transformation takes 1.66% of overall memory allocation.
chibenwa
added a commit
to chibenwa/james-project
that referenced
this issue
May 18, 2022
CF aws/aws-sdk-java-v2#3193 `AsyncResponseTransformer.toBytes()` relies internally on an unsized ByteArrayOutputStream (that thus will expend many times) and the result will be copied when calling `toByteArray()` and another defensive copy is carried other when transforming the future. We thus implement a response transformer variation: - Remove needless defensive copies inside `ByteArrayAsyncResponseTransformer`. The byte array is passed to the caller, who then becomes responsible of it, and nobody else references the old byte array once the publisher completes. This can be an instant win coming at a very low price. - Rely on `GetResponse::contentLength` to size a byte array and copy incoming buffers to it in place. This requires knowledge about response type... Thus this might be hardly doable in a generic fashion. On a typical IMAP benchmark S3 getObject toBytes transformation takes 1.66% of overall memory allocation.
chibenwa
added a commit
to apache/james-project
that referenced
this issue
May 19, 2022
CF aws/aws-sdk-java-v2#3193 `AsyncResponseTransformer.toBytes()` relies internally on an unsized ByteArrayOutputStream (that thus will expend many times) and the result will be copied when calling `toByteArray()` and another defensive copy is carried other when transforming the future. We thus implement a response transformer variation: - Remove needless defensive copies inside `ByteArrayAsyncResponseTransformer`. The byte array is passed to the caller, who then becomes responsible of it, and nobody else references the old byte array once the publisher completes. This can be an instant win coming at a very low price. - Rely on `GetResponse::contentLength` to size a byte array and copy incoming buffers to it in place. This requires knowledge about response type... Thus this might be hardly doable in a generic fashion. On a typical IMAP benchmark S3 getObject toBytes transformation takes 1.66% of overall memory allocation.
The same problem is present for putObject |
This was referenced Apr 20, 2023
12 tasks
1 task
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Uh oh!
There was an error while loading. Please reload this page.
Describe the bug
Given the following piece of code:
Several data copies do take place:
ByteBuf::nioBuffers
could be relied on to minimize copies. I did not expore this option myself.AsyncResponseTransformer.toBytes()
relies internally on an unsized ByteArrayOutputStream (that thus will expend many times) and the result will be copied when callingtoByteArray()
and another defensive copy is carried other when transforming the future.Expected Behavior
Minimize memory copies in such a scenario to minimize heap pressure.
Current Behavior
Reproduction Steps
I used https://github.com/jvm-profiling-tools/async-profiler for the flame graphs.
Possible Solution
ByteArrayAsyncResponseTransformer
. The byte array is passed to the caller, who then becomes responsible of it, and nobody else references the old byte array once the publisher completes. This can be an instant win coming at a very low price.GetResponse::contentLength
to size a byte array and copy incoming buffers to it in place. This requires knowledge about response type... Thus this might be hardly doable in a generic fashion.I tested this successfully:
Additional Information/Context
Apache James relies on the S3 driver for email content, thus speed and heap impact is important for us as S3 getObject will be called on each email content read.
AWS Java SDK version used
2.17.189
JDK version used
openjdk version "11.0.15" 2022-04-19
Operating System and version
Ubuntu 20.04.4 LTS
The text was updated successfully, but these errors were encountered: