Skip to content

Allow to get array from BytesWrapper without copying #1959

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task done
mar-kolya opened this issue Jul 23, 2020 · 1 comment · Fixed by #1977
Closed
1 task done

Allow to get array from BytesWrapper without copying #1959

mar-kolya opened this issue Jul 23, 2020 · 1 comment · Fixed by #1977
Labels
feature-request A feature should be added or improved.

Comments

@mar-kolya
Copy link

mar-kolya commented Jul 23, 2020

In performance sensitive applications in is important to reduce number of data-copies that happen during processing.
Unfortunately currently when we get data from S3 and then try to use response byte array we have to:

  • Either do an array copy
  • Get a readonly ByteBuffer - which doesn't provide access to array and forces us to do an array copy later

Describe the Feature

Provide a method in BytesWrapper that allows users to get either underlying array directly or a non-read-only ByteBuffer

Is your Feature Request related to a problem?

Current S3 client implementation does multiple data copies - this is just one more of them, and this reduces performace.

Proposed Solution

Describe alternatives you've considered

Unfortunately there are not many alternatives: in some cases one needs access to arrays, for example if one tries top use JNI compression implementations.

Additional Context

  • I may be able to implement this feature request

Your Environment

  • AWS Java SDK version used:
  • JDK version used:
  • Operating System and version:
@mar-kolya mar-kolya added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Jul 23, 2020
@debora-ito
Copy link
Member

This is a reasonable feature request @mar-kolya, thank you for reporting it. If you submit a PR we will take a look.

@debora-ito debora-ito removed the needs-triage This issue or PR still needs to be triaged. label Jul 24, 2020
aws-sdk-java-automation added a commit that referenced this issue Mar 11, 2022
…688b76afb

Pull request: release <- staging/c5a9416c-9296-4379-ae26-248688b76afb
rtyley added a commit to rtyley/aws-sdk-async-response-bytes that referenced this issue Aug 26, 2023
Once the `ByteArrayAsyncResponseTransformer` has gathered all the response
bytes, we still need to wrap those bytes and the response object in a
`ResponseBytes` instance - but if we use `ResponseBytes.fromByteArray()`, a
whole new byte array will be allocated, which is bad for two reasons:

* While copying, the JVM heap must briefly hold both the old & new byte arrays -
  roughly speaking, doubling the memory requirements.
* Copying the bytes from one array to another takes a little bit of CPU time
  (obviously this varies: `System.arraycopy()` for 40MB of bytes takes ~2ms
  on my M1 machine).

A faster, more memory efficient alternative to `ResponseBytes.fromByteArray()`
is `ResponseBytes.fromByteArrayUnsafe()`, added to the AWS SDK in August 2020
with aws/aws-sdk-java-v2#1977 in response to
aws/aws-sdk-java-v2#1959. The 'Unsafe' in the name
is a warning to users of this method that the underlying byte array is _not_
copied, and so could be susceptible to badly-behaving code manipulating the
contents of the byte array after the `ResponseBytes` is handed to the calling
code.

If only the trusted SDK code has access to the original byte array before
`ResponseBytes` is handed over to the caller, and once the `ResponseBytes`
instance is handed over to the caller, the SDK code has no further use for
the original byte array, then it is safe to use `ResponseBytes.fromByteArrayUnsafe()`
in the trusted SDK code, and return the resulting `ResponseBytes` to the user,
saving a double-allocation of memory, and the CPU time for the copying of
bytes.

See also:

* aws/aws-sdk-java-v2#1959
* aws/aws-sdk-java-v2#1977
rtyley added a commit to rtyley/aws-sdk-async-response-bytes that referenced this issue Aug 26, 2023
Once the `ByteArrayAsyncResponseTransformer` has gathered all the response
bytes, we still need to wrap those bytes and the response object in a
`ResponseBytes` instance - but if we use `ResponseBytes.fromByteArray()`, a
whole new byte array will be allocated, which is bad for two reasons:

* While copying, the JVM heap must briefly hold both the old & new byte arrays -
  roughly speaking, doubling the memory requirements.
* Copying the bytes from one array to another takes a little bit of CPU time
  (obviously this varies: `System.arraycopy()` for 40MB of bytes takes ~2ms
  on my M1 machine).

A faster, more memory efficient alternative to `ResponseBytes.fromByteArray()`
is `ResponseBytes.fromByteArrayUnsafe()`, added to the AWS SDK in August 2020
with aws/aws-sdk-java-v2#1977 in response to
aws/aws-sdk-java-v2#1959. The 'Unsafe' in the name
is a warning to users of this method that the underlying byte array is _not_
copied, and so could be susceptible to badly-behaving code manipulating the
contents of the byte array after the `ResponseBytes` is handed to the calling
code.

If only the trusted SDK code has access to the original byte array before
`ResponseBytes` is handed over to the caller, and once the `ResponseBytes`
instance is handed over to the caller, the SDK code has no further use for
the original byte array, then it is safe to use `ResponseBytes.fromByteArrayUnsafe()`
in the trusted SDK code, and return the resulting `ResponseBytes` to the user,
saving a double-allocation of memory, and the CPU time for the copying of
bytes.

See also:

* aws/aws-sdk-java-v2#1959
* aws/aws-sdk-java-v2#1977
rtyley added a commit to rtyley/aws-sdk-async-response-bytes that referenced this issue Aug 26, 2023
Once the `ByteArrayAsyncResponseTransformer` has gathered all the response
bytes, we still need to wrap those bytes and the response object in a
`ResponseBytes` instance - but if we use `ResponseBytes.fromByteArray()`, a
whole new byte array will be allocated, which is bad for two reasons:

* While copying, the JVM heap must briefly hold both the old & new byte arrays -
  roughly speaking, doubling the memory requirements.
* Copying the bytes from one array to another takes a little bit of CPU time
  (obviously this varies: `System.arraycopy()` for 40MB of bytes takes ~2ms
  on my M1 machine).

A faster, more memory efficient alternative to `ResponseBytes.fromByteArray()`
is `ResponseBytes.fromByteArrayUnsafe()`, added to the AWS SDK in August 2020
with aws/aws-sdk-java-v2#1977 in response to
aws/aws-sdk-java-v2#1959. The 'Unsafe' in the name
is a warning to users of this method that the underlying byte array is _not_
copied, and so could be susceptible to badly-behaving code manipulating the
contents of the byte array after the `ResponseBytes` is handed to the calling
code.

If only the trusted SDK code has access to the original byte array before
`ResponseBytes` is handed over to the caller, and once the `ResponseBytes`
instance is handed over to the caller, the SDK code has no further use for
the original byte array, then it is safe to use `ResponseBytes.fromByteArrayUnsafe()`
in the trusted SDK code, and return the resulting `ResponseBytes` to the user,
saving a double-allocation of memory, and the CPU time for the copying of
bytes.

See also:

* aws/aws-sdk-java-v2#1959
* aws/aws-sdk-java-v2#1977
rtyley added a commit to rtyley/aws-sdk-async-response-bytes that referenced this issue Aug 27, 2023
Once the `ByteArrayAsyncResponseTransformer` has gathered all the response
bytes, we still need to wrap those bytes and the response object in a
`ResponseBytes` instance - but if we use `ResponseBytes.fromByteArray()`, a
whole new byte array will be allocated, which is bad for two reasons:

* While copying, the JVM heap must briefly hold both the old & new byte arrays -
  roughly speaking, doubling the memory requirements.
* Copying the bytes from one array to another takes a little bit of CPU time
  (obviously this varies: `System.arraycopy()` for 40MB of bytes takes ~2ms
  on my M1 machine).

A faster, more memory efficient alternative to `ResponseBytes.fromByteArray()`
is `ResponseBytes.fromByteArrayUnsafe()`, added to the AWS SDK in August 2020
with aws/aws-sdk-java-v2#1977 in response to
aws/aws-sdk-java-v2#1959. The 'Unsafe' in the name
is a warning to users of this method that the underlying byte array is _not_
copied, and so could be susceptible to badly-behaving code manipulating the
contents of the byte array after the `ResponseBytes` is handed to the calling
code.

If only the trusted SDK code has access to the original byte array before
`ResponseBytes` is handed over to the caller, and once the `ResponseBytes`
instance is handed over to the caller, the SDK code has no further use for
the original byte array, then it is safe to use `ResponseBytes.fromByteArrayUnsafe()`
in the trusted SDK code, and return the resulting `ResponseBytes` to the user,
saving a double-allocation of memory, and the CPU time for the copying of
bytes.

See also:

* aws/aws-sdk-java-v2#1959
* aws/aws-sdk-java-v2#1977
rtyley added a commit to rtyley/aws-sdk-async-response-bytes that referenced this issue Sep 2, 2023
Once the `ByteArrayAsyncResponseTransformer` has gathered all the response
bytes, we still need to wrap those bytes and the response object in a
`ResponseBytes` instance - but if we use `ResponseBytes.fromByteArray()`, a
whole new byte array will be allocated, which is bad for two reasons:

* While copying, the JVM heap must briefly hold both the old & new byte arrays -
  roughly speaking, doubling the memory requirements.
* Copying the bytes from one array to another takes a little bit of CPU time
  (obviously this varies: `System.arraycopy()` for 40MB of bytes takes ~2ms
  on my M1 machine).

A faster, more memory efficient alternative to `ResponseBytes.fromByteArray()`
is `ResponseBytes.fromByteArrayUnsafe()`, added to the AWS SDK in August 2020
with aws/aws-sdk-java-v2#1977 in response to
aws/aws-sdk-java-v2#1959. The 'Unsafe' in the name
is a warning to users of this method that the underlying byte array is _not_
copied, and so could be susceptible to badly-behaving code manipulating the
contents of the byte array after the `ResponseBytes` is handed to the calling
code.

If only the trusted SDK code has access to the original byte array before
`ResponseBytes` is handed over to the caller, and once the `ResponseBytes`
instance is handed over to the caller, the SDK code has no further use for
the original byte array, then it is safe to use `ResponseBytes.fromByteArrayUnsafe()`
in the trusted SDK code, and return the resulting `ResponseBytes` to the user,
saving a double-allocation of memory, and the CPU time for the copying of
bytes.

See also:

* aws/aws-sdk-java-v2#1959
* aws/aws-sdk-java-v2#1977
rtyley added a commit to rtyley/aws-sdk-async-response-bytes that referenced this issue Sep 2, 2023
Once the `ByteArrayAsyncResponseTransformer` has gathered all the response
bytes, we still need to wrap those bytes and the response object in a
`ResponseBytes` instance - but if we use `ResponseBytes.fromByteArray()`, a
whole new byte array will be allocated, which is bad for two reasons:

* While copying, the JVM heap must briefly hold both the old & new byte arrays -
  roughly speaking, doubling the memory requirements.
* Copying the bytes from one array to another takes a little bit of CPU time
  (obviously this varies: `System.arraycopy()` for 40MB of bytes takes ~2ms
  on my M1 machine).

A faster, more memory efficient alternative to `ResponseBytes.fromByteArray()`
is `ResponseBytes.fromByteArrayUnsafe()`, added to the AWS SDK in August 2020
with aws/aws-sdk-java-v2#1977 in response to
aws/aws-sdk-java-v2#1959. The 'Unsafe' in the name
is a warning to users of this method that the underlying byte array is _not_
copied, and so could be susceptible to badly-behaving code manipulating the
contents of the byte array after the `ResponseBytes` is handed to the calling
code.

If only the trusted SDK code has access to the original byte array before
`ResponseBytes` is handed over to the caller, and once the `ResponseBytes`
instance is handed over to the caller, the SDK code has no further use for
the original byte array, then it is safe to use `ResponseBytes.fromByteArrayUnsafe()`
in the trusted SDK code, and return the resulting `ResponseBytes` to the user,
saving a double-allocation of memory, and the CPU time for the copying of
bytes.

See also:

* aws/aws-sdk-java-v2#1959
* aws/aws-sdk-java-v2#1977
rtyley added a commit to rtyley/aws-sdk-async-response-bytes that referenced this issue Sep 2, 2023
Once the `ByteArrayAsyncResponseTransformer` has gathered all the response
bytes, we still need to wrap those bytes and the response object in a
`ResponseBytes` instance - but if we use `ResponseBytes.fromByteArray()`, a
whole new byte array will be allocated, which is bad for two reasons:

* While copying, the JVM heap must briefly hold both the old & new byte arrays -
  roughly speaking, doubling the memory requirements.
* Copying the bytes from one array to another takes a little bit of CPU time
  (obviously this varies: `System.arraycopy()` for 40MB of bytes takes ~2ms
  on my M1 machine).

A faster, more memory efficient alternative to `ResponseBytes.fromByteArray()`
is `ResponseBytes.fromByteArrayUnsafe()`, added to the AWS SDK in August 2020
with aws/aws-sdk-java-v2#1977 in response to
aws/aws-sdk-java-v2#1959. The 'Unsafe' in the name
is a warning to users of this method that the underlying byte array is _not_
copied, and so could be susceptible to badly-behaving code manipulating the
contents of the byte array after the `ResponseBytes` is handed to the calling
code.

If only the trusted SDK code has access to the original byte array before
`ResponseBytes` is handed over to the caller, and once the `ResponseBytes`
instance is handed over to the caller, the SDK code has no further use for
the original byte array, then it is safe to use `ResponseBytes.fromByteArrayUnsafe()`
in the trusted SDK code, and return the resulting `ResponseBytes` to the user,
saving a double-allocation of memory, and the CPU time for the copying of
bytes.

See also:

* aws/aws-sdk-java-v2#1959
* aws/aws-sdk-java-v2#1977
rtyley added a commit to rtyley/aws-sdk-async-response-bytes that referenced this issue Sep 2, 2023
Once the `ByteArrayAsyncResponseTransformer` has gathered all the response
bytes, we still need to wrap those bytes and the response object in a
`ResponseBytes` instance - but if we use `ResponseBytes.fromByteArray()`, a
whole new byte array will be allocated, which is bad for two reasons:

* While copying, the JVM heap must briefly hold both the old & new byte arrays -
  roughly speaking, doubling the memory requirements.
* Copying the bytes from one array to another takes a little bit of CPU time
  (obviously this varies: `System.arraycopy()` for 40MB of bytes takes ~2ms
  on my M1 machine).

A faster, more memory efficient alternative to `ResponseBytes.fromByteArray()`
is `ResponseBytes.fromByteArrayUnsafe()`, added to the AWS SDK in August 2020
with aws/aws-sdk-java-v2#1977 in response to
aws/aws-sdk-java-v2#1959. The 'Unsafe' in the name
is a warning to users of this method that the underlying byte array is _not_
copied, and so could be susceptible to badly-behaving code manipulating the
contents of the byte array after the `ResponseBytes` is handed to the calling
code.

If only the trusted SDK code has access to the original byte array before
`ResponseBytes` is handed over to the caller, and once the `ResponseBytes`
instance is handed over to the caller, the SDK code has no further use for
the original byte array, then it is safe to use `ResponseBytes.fromByteArrayUnsafe()`
in the trusted SDK code, and return the resulting `ResponseBytes` to the user,
saving a double-allocation of memory, and the CPU time for the copying of
bytes.

See also:

* aws/aws-sdk-java-v2#1959
* aws/aws-sdk-java-v2#1977
rtyley added a commit to rtyley/aws-sdk-async-response-bytes that referenced this issue Sep 2, 2023
Once the `ByteArrayAsyncResponseTransformer` has gathered all the response
bytes, we still need to wrap those bytes and the response object in a
`ResponseBytes` instance - but if we use `ResponseBytes.fromByteArray()`, a
whole new byte array will be allocated, which is bad for two reasons:

* While copying, the JVM heap must briefly hold both the old & new byte arrays -
  roughly speaking, doubling the memory requirements.
* Copying the bytes from one array to another takes a little bit of CPU time
  (obviously this varies: `System.arraycopy()` for 40MB of bytes takes ~2ms
  on my M1 machine).

A faster, more memory efficient alternative to `ResponseBytes.fromByteArray()`
is `ResponseBytes.fromByteArrayUnsafe()`, added to the AWS SDK in August 2020
with aws/aws-sdk-java-v2#1977 in response to
aws/aws-sdk-java-v2#1959. The 'Unsafe' in the name
is a warning to users of this method that the underlying byte array is _not_
copied, and so could be susceptible to badly-behaving code manipulating the
contents of the byte array after the `ResponseBytes` is handed to the calling
code.

If only the trusted SDK code has access to the original byte array before
`ResponseBytes` is handed over to the caller, and once the `ResponseBytes`
instance is handed over to the caller, the SDK code has no further use for
the original byte array, then it is safe to use `ResponseBytes.fromByteArrayUnsafe()`
in the trusted SDK code, and return the resulting `ResponseBytes` to the user,
saving a double-allocation of memory, and the CPU time for the copying of
bytes.

See also:

* aws/aws-sdk-java-v2#1959
* aws/aws-sdk-java-v2#1977
rtyley added a commit to rtyley/aws-sdk-async-response-bytes that referenced this issue Sep 2, 2023
Once the `ByteArrayAsyncResponseTransformer` has gathered all the response
bytes, we still need to wrap those bytes and the response object in a
`ResponseBytes` instance - but if we use `ResponseBytes.fromByteArray()`, a
whole new byte array will be allocated, which is bad for two reasons:

* While copying, the JVM heap must briefly hold both the old & new byte arrays -
  roughly speaking, doubling the memory requirements.
* Copying the bytes from one array to another takes a little bit of CPU time
  (obviously this varies: `System.arraycopy()` for 40MB of bytes takes ~2ms
  on my M1 machine).

A faster, more memory efficient alternative to `ResponseBytes.fromByteArray()`
is `ResponseBytes.fromByteArrayUnsafe()`, added to the AWS SDK in August 2020
with aws/aws-sdk-java-v2#1977 in response to
aws/aws-sdk-java-v2#1959. The 'Unsafe' in the name
is a warning to users of this method that the underlying byte array is _not_
copied, and so could be susceptible to badly-behaving code manipulating the
contents of the byte array after the `ResponseBytes` is handed to the calling
code.

If only the trusted SDK code has access to the original byte array before
`ResponseBytes` is handed over to the caller, and once the `ResponseBytes`
instance is handed over to the caller, the SDK code has no further use for
the original byte array, then it is safe to use `ResponseBytes.fromByteArrayUnsafe()`
in the trusted SDK code, and return the resulting `ResponseBytes` to the user,
saving a double-allocation of memory, and the CPU time for the copying of
bytes.

See also:

* aws/aws-sdk-java-v2#1959
* aws/aws-sdk-java-v2#1977
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved.
Projects
None yet
2 participants