Skip to content

S3 PutObject with ChecksumAlgorithm Sets Empty Content-Encoding Header #3569

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
iheffernan opened this issue Nov 28, 2022 · 10 comments
Closed
Assignees
Labels
bug This issue is a bug.

Comments

@iheffernan
Copy link

Describe the bug

When performing an S3 PutObject operation with the ChecksumAlgorithm property set, an empty Content-Encoding header is set on the object in S3.

Screenshot 2022-11-28 at 7 51 52 AM

NOTE: If I manually compute and specify the checksum, the Content-Encoding header does NOT get set.

Expected Behavior

Content-Encoding header should remain unset when using ChecksumAlgorithm.

Current Behavior

No errors present. Only noticed because it was causing issues downstream when retrieving objects through the CloudFront.

Reproduction Steps

PutObjectRequest.Builder builder = PutObjectRequest.builder().bucket(bucket).key(key);
builder.checksumAlgorithm(ChecksumAlgorithm.SHA256);
s3Client.putObject(builder.build(), RequestBody.fromByteBuffer(payload));

results in empty Content-Encoding System defined metadata when viewing in S3 console.

PutObjectRequest.Builder builder = PutObjectRequest.builder().bucket(bucket).key(key);
builder.checksumSHA256(createSHA256Checksum(payload).get());
s3Client.putObject(builder.build(), RequestBody.fromByteBuffer(payload));

results in no Content-Encoding System defined metadata in S3 console.

Possible Solution

No response

Additional Information/Context

No response

AWS Java SDK version used

2.18.20

JDK version used

17.0.3

Operating System and version

MacOS Ventura 13.0.1

@iheffernan iheffernan added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Nov 28, 2022
@awanrky
Copy link

awanrky commented Nov 30, 2022

I see the same behavior, version 2.18.22, jdk 11.0.17

In addition, when trying to set the Content-Encoding while using checksumAlgorithm(), the system metadata will have a blank value instead of the value I try to set.

PutObjectRequest.Builder builder = PutObjectRequest.builder().bucket(bucket).key(key);
builder.checksumAlgorithm(ChecksumAlgorithm.SHA256);
builder.contentEncoding("gzip");
s3Client.putObject(builder.build(), RequestBody.fromByteBuffer(payload));

results in a blank Content-Encoding in the system metadata instead of the value of "gzip". Leaving out the builder.checksumAlgorithm() line results in the value being set correctly.

@debora-ito debora-ito self-assigned this Nov 30, 2022
@debora-ito
Copy link
Member

@iheffernan @awanrky thank you for reaching out.

I'm able to repro. We'll investigate.

@debora-ito debora-ito removed the needs-triage This issue or PR still needs to be triaged. label Dec 1, 2022
@joviegas
Copy link
Contributor

joviegas commented Dec 8, 2022

When Trailing checksum is sent , we need to update the Content-encoding as aws-chunked.
The WIRE logs shows that

org.apache.http.wire - http-outgoing-1 >> "Content-encoding: aws-chunked[\r][\n]"

However, this is not shown in the AWS UI Console.
I and Debora will work with S3 team to check why UI is not showing aws-chunked

I observed the same behaviour with CLI command too

aws s3api  put-object --bucket bucket-7986 --key objectscliChecksum --body objectfile --checksum-algorithm CRC32

@michaeljohnalbers
Copy link

michaeljohnalbers commented Dec 28, 2022

I'm seeing something somewhat similar, but using the S3TransferManager (with S3AsyncClient using CRT). When I upload some data the content encoding isn't getting set.

The code

                byte[] compressedResultBytes = byteArrayOutputStream.toByteArray();

                String md5Base64 = BASE64_ENCODER.encodeToString(DigestUtils.md5(compressedResultBytes));

                PutObjectRequest putObjectRequest = PutObjectRequest.builder()
                        .bucket(cacheS3Bucket)
                        .key(cacheObjectKey)
                        .contentType(resultContentType)
                        .contentEncoding(CONTENT_ENCODING)
                        .contentLength((long) compressedResultBytes.length)
                        .contentMD5(md5Base64)
                        .serverSideEncryption(ServerSideEncryption.AES256)
                        .build();
                AsyncRequestBody asyncRequestBody = AsyncRequestBody.fromBytes(compressedResultBytes);
                UploadRequest uploadRequest = UploadRequest.builder()
                        .putObjectRequest(putObjectRequest)
                        .requestBody(asyncRequestBody)
                        .build();
                Upload upload = s3TransferManager.upload(uploadRequest);

In this case the CONTENT_ENCODING value is "gzip". The object in S3 has the same missing content encoding as @iheffernan's screenshot.

As a workaround, disabling checksum validation in the S3AsynClient (https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/S3CrtAsyncClientBuilder.html#checksumValidationEnabled(java.lang.Boolean) will cause the content encoding to get set.

Then, I tried adding a SHA256 checksum (with or without setting the contentMD5 value. In this case the upload always failed with the following exception

! software.amazon.awssdk.core.exception.SdkClientException: Failed to send the request: The connection has closed or is closing.
! at software.amazon.awssdk.core.exception.SdkClientException$BuilderImpl.build(SdkClientException.java:111)
! at software.amazon.awssdk.core.exception.SdkClientException.create(SdkClientException.java:43)
! at software.amazon.awssdk.services.s3.internal.crt.S3CrtResponseHandlerAdapter.handleError(S3CrtResponseHandlerAdapter.java:127)
! at software.amazon.awssdk.services.s3.internal.crt.S3CrtResponseHandlerAdapter.onFinished(S3CrtResponseHandlerAdapter.java:93)
! at software.amazon.awssdk.crt.s3.S3MetaRequestResponseHandlerNativeAdapter.onFinished(S3MetaRequestResponseHandlerNativeAdapter.java:24)

I'm using version 2.19.4 of the SDK on Java 11, GraalVM version 22.3.0, ARM64 running in an Ubuntu 22.10 container on Mac OS 12.6.

@debora-ito
Copy link
Member

A quick update: we are still checking with the S3 team what's the expected behavior of using chunked encoding with additional Content-encoding headers. Will update here whenever we have new info.

@giacomorebecchi
Copy link

giacomorebecchi commented Jan 20, 2023

I'm experiencing the same issue using boto3.

When Trailing checksum is sent , we need to update the Content-encoding as aws-chunked.

@joviegas is this an expected and compulsory behavior?

My use case is that I'm passing the ChecksumAlgorithm argument when uploading a gzipped file so that S3 calculates and stores the checksum in the head object, as explained by this docs. Later in my workflow, I sometimes need to retrieve that checksum to compare it with the checksum of some local files. Still, I would want to be able to set and mantain the ContentEncoding to gzip.

@joviegas
Copy link
Contributor

Thanks for mentioning the use case , I need to check this with S3 , I am follow up with the team again.

@joviegas
Copy link
Contributor

joviegas commented Jan 26, 2023

@giacomorebecchi Thanks for follow up
I checked with S3 and realized that they do take multiple header values for "Content-encoding".
While sending checksums for putObject the checksum feature required to send "aws-chunked" in "Content-header", the sdk used to do put instead of append for this header. We are fixing this issue SDK client side to do a append instead of put

joviegas added a commit that referenced this issue Jan 26, 2023
…ten to aws-chunked with Checksum algorithm enabled (#3720)
@debora-ito
Copy link
Member

A fix was released in version 2.19.25.

I tested both vanilla S3Client and S3TransferManager, and the content encoding seems to be reflected in the S3 object metadata now.

Let us know if you have any questions.

@github-actions
Copy link

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug.
Projects
None yet
Development

No branches or pull requests

6 participants