Skip to content

Avoid copying byte array for ResponseBytes #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

rtyley
Copy link
Owner

@rtyley rtyley commented Sep 2, 2023

Once the ByteArrayAsyncResponseTransformer has gathered all the response bytes, we still need to wrap those bytes and the response object in a ResponseBytes instance - but if we use ResponseBytes.fromByteArray(), a whole new byte array will be allocated, which is bad for two reasons:

  • While copying, the JVM heap must briefly hold both the old & new byte arrays - roughly speaking, doubling the memory requirements.
  • Copying the bytes from one array to another takes a little bit of CPU time (obviously this varies: System.arraycopy() for 40MB of bytes takes ~2ms on my M1 machine).

A faster, more memory efficient alternative to ResponseBytes.fromByteArray() is ResponseBytes.fromByteArrayUnsafe(), added to the AWS SDK in August 2020 with aws/aws-sdk-java-v2#1977 in response to aws/aws-sdk-java-v2#1959. The 'Unsafe' in the name is a warning to users of this method that the underlying byte array is not copied, and so could be susceptible to badly-behaving code manipulating the contents of the byte array after the ResponseBytes is handed to the calling code.

If only the trusted SDK code has access to the original byte array before ResponseBytes is handed over to the caller, and once the ResponseBytes instance is handed over to the caller, the SDK code has no further use for the original byte array, then it is safe to use ResponseBytes.fromByteArrayUnsafe() in the trusted SDK code, and return the resulting ResponseBytes to the user, saving a double-allocation of memory, and the CPU time for the copying of bytes.

See also:

Required RAM: 787 MB

Once the `ByteArrayAsyncResponseTransformer` has gathered all the response
bytes, we still need to wrap those bytes and the response object in a
`ResponseBytes` instance - but if we use `ResponseBytes.fromByteArray()`, a
whole new byte array will be allocated, which is bad for two reasons:

* While copying, the JVM heap must briefly hold both the old & new byte arrays -
  roughly speaking, doubling the memory requirements.
* Copying the bytes from one array to another takes a little bit of CPU time
  (obviously this varies: `System.arraycopy()` for 40MB of bytes takes ~2ms
  on my M1 machine).

A faster, more memory efficient alternative to `ResponseBytes.fromByteArray()`
is `ResponseBytes.fromByteArrayUnsafe()`, added to the AWS SDK in August 2020
with aws/aws-sdk-java-v2#1977 in response to
aws/aws-sdk-java-v2#1959. The 'Unsafe' in the name
is a warning to users of this method that the underlying byte array is _not_
copied, and so could be susceptible to badly-behaving code manipulating the
contents of the byte array after the `ResponseBytes` is handed to the calling
code.

If only the trusted SDK code has access to the original byte array before
`ResponseBytes` is handed over to the caller, and once the `ResponseBytes`
instance is handed over to the caller, the SDK code has no further use for
the original byte array, then it is safe to use `ResponseBytes.fromByteArrayUnsafe()`
in the trusted SDK code, and return the resulting `ResponseBytes` to the user,
saving a double-allocation of memory, and the CPU time for the copying of
bytes.

See also:

* aws/aws-sdk-java-v2#1959
* aws/aws-sdk-java-v2#1977
@rtyley rtyley force-pushed the main branch 3 times, most recently from 3d8e868 to 4e1d66c Compare September 4, 2023 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant