Skip to content

Blocking call while resolving credentials inside an EKS cluster using DynamoDB async client (Blockhound) #2360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rp199 opened this issue Mar 31, 2021 · 2 comments
Labels
bug This issue is a bug. p2 This is a standard priority issue

Comments

@rp199
Copy link

rp199 commented Mar 31, 2021

Describe the bug

While testing a spring boot webflux app inside a EKS cluster I've detected a blocking call within AWS libraries thanks to BlockHound.

It happens while resolving the credentials right before the dynamodb call using the async client. This happens even when I try to provide a custom aws credentials provider:

    @Bean
    fun awsCredentialsProviderChain(): AwsCredentialsProviderChain {
        return AwsCredentialsProviderChain.of(
            ContainerCredentialsProvider.builder().asyncCredentialUpdateEnabled(true)
                .build(),
            InstanceProfileCredentialsProvider.builder()
                .asyncCredentialUpdateEnabled(true)
                .build()
        )
    }

My dynamodb client:


    @Bean
    fun dynamoDbAsyncClient(awsCredentialsProviderChain: AwsCredentialsProviderChain): DynamoDbAsyncClient =
        DynamoDbAsyncClient
            .builder().credentialsProvider(awsCredentialsProviderChain).build()

Locally everything works fine, I think the issue lies inside the InstanceProfileCredentialsProvider, the other providers seem to be working fine.

Expected Behavior

Dynamodb aync client should be non blocking so I'd expect to use it without any blocking cals

Current Behavior

Blockhounds detects a blocking call.
Stacktrace:

reactor.blockhound.BlockingOperationError: Blocking call! java.net.SocketOutputStream#socketWrite0 at java.base/java.net.SocketOutputStream.socketWrite0(Unknown Source) Suppressed: reactor.core.publisher.FluxOnAssembly$OnAssemblyException: Error has been observed at the following site(s): |_ checkpoint ⇢ com.mypackage.MyFilter [DefaultWebFilterChain] |_ checkpoint ⇢ org.springframework.boot.actuate.metrics.web.reactive.server.MetricsWebFilter [DefaultWebFilterChain] |_ checkpoint ⇢ HTTP GET "/v1/trolleys/1001781417" [ExceptionHandlingWebHandler] Stack trace: at java.base/java.net.SocketOutputStream.socketWrite0(Unknown Source) at java.base/java.net.SocketOutputStream.socketWrite(Unknown Source) at java.base/java.net.SocketOutputStream.write(Unknown Source) at java.base/java.io.BufferedOutputStream.flushBuffer(Unknown Source) at java.base/java.io.BufferedOutputStream.flush(Unknown Source) at java.base/java.io.PrintStream.flush(Unknown Source) at java.base/sun.net.www.MessageHeader.print(Unknown Source) at java.base/sun.net.www.http.HttpClient.writeRequests(Unknown Source) at java.base/sun.net.www.http.HttpClient.writeRequests(Unknown Source) at java.base/sun.net.www.protocol.http.HttpURLConnection.writeRequests(Unknown Source) at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source) at java.base/sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source) at java.base/java.net.HttpURLConnection.getResponseCode(Unknown Source) at software.amazon.awssdk.regions.util.HttpResourcesUtils.readResource(HttpResourcesUtils.java:114) at software.amazon.awssdk.regions.internal.util.EC2MetadataUtils.getToken(EC2MetadataUtils.java:442) at software.amazon.awssdk.auth.credentials.InstanceProfileCredentialsProvider.getToken(InstanceProfileCredentialsProvider.java:83) at software.amazon.awssdk.auth.credentials.InstanceProfileCredentialsProvider.getCredentialsEndpointProvider(InstanceProfileCredentialsProvider.java:69) at software.amazon.awssdk.auth.credentials.HttpCredentialsProvider.refreshCredentials(HttpCredentialsProvider.java:74) at software.amazon.awssdk.utils.cache.CachedSupplier.refreshCache(CachedSupplier.java:132) at software.amazon.awssdk.utils.cache.CachedSupplier.get(CachedSupplier.java:89) at java.base/java.util.Optional.map(Unknown Source) at software.amazon.awssdk.auth.credentials.HttpCredentialsProvider.resolveCredentials(HttpCredentialsProvider.java:146) at software.amazon.awssdk.auth.credentials.AwsCredentialsProviderChain.resolveCredentials(AwsCredentialsProviderChain.java:91) at software.amazon.awssdk.awscore.client.handler.AwsClientHandlerUtils.createExecutionContext(AwsClientHandlerUtils.java:79) at software.amazon.awssdk.awscore.client.handler.AwsAsyncClientHandler.createExecutionContext(AwsAsyncClientHandler.java:65) at software.amazon.awssdk.core.internal.handler.BaseAsyncClientHandler.lambda$execute$1(BaseAsyncClientHandler.java:78) at software.amazon.awssdk.core.internal.handler.BaseAsyncClientHandler.measureApiCallSuccess(BaseAsyncClientHandler.java:276) at software.amazon.awssdk.core.internal.handler.BaseAsyncClientHandler.execute(BaseAsyncClientHandler.java:75) at software.amazon.awssdk.awscore.client.handler.AwsAsyncClientHandler.execute(AwsAsyncClientHandler.java:52) at software.amazon.awssdk.services.dynamodb.DefaultDynamoDbAsyncClient.getItem(DefaultDynamoDbAsyncClient.java:3256) at software.amazon.awssdk.enhanced.dynamodb.internal.operations.CommonOperation.executeAsync(CommonOperation.java:140) at software.amazon.awssdk.enhanced.dynamodb.internal.operations.TableOperation.executeOnPrimaryIndexAsync(TableOperation.java:81) at software.amazon.awssdk.enhanced.dynamodb.internal.client.DefaultDynamoDbAsyncTable.getItem(DefaultDynamoDbAsyncTable.java:136) at software.amazon.awssdk.enhanced.dynamodb.internal.client.DefaultDynamoDbAsyncTable.getItem(DefaultDynamoDbAsyncTable.java:143)

Steps to Reproduce

Run a spring boot webflux app with a dynamodb async client with blockound installed inside an EKS cluster. Using the client will throw an exception

Context

Not sure if this is going to have any impact on the service perfomance. The main point here is that this issue prevents the client of being full non blocking.

Meanwhile, I can ignore this blocking call on the blockhound integration configurations.

Your Environment

  • AWS Java SDK version used: 2.16.25
  • JDK version used: openjdk-11-jre-ubi-minimal:11.0.10-hotspot
  • Kotlin version: 1.4.31
  • Spring boot version: 2.4.4
@rp199 rp199 added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Mar 31, 2021
@debora-ito
Copy link
Member

Hi @rp199, thank you for reaching out.

I talked to the team, and currently all the Credential resolving process makes blocking calls, even in async clients. We have a task in the backlog to change this, pending prioritization. Please let us know if you find that this is impacting performance in any aspect.

@debora-ito debora-ito removed the needs-triage This issue or PR still needs to be triaged. label Apr 3, 2021
Bennett-Lynch pushed a commit to Bennett-Lynch/aws-sdk-java-v2 that referenced this issue Dec 7, 2021
There are two known occurrences where the SDK may currently block from
within a Netty EventLoop:

1. aws#2145
2. aws#2360

Allowing BlockHound to forbid these operations may fail existing
integration and stability tests. While we have outstanding issues to fix
these items, until they are resolved, we need to allow our existing
integration tests to continue to pass. We should explicitly allow-list
these methods so that they do not interfere with existing tests and so
that we maintain visibility on future regression detection.
Bennett-Lynch added a commit that referenced this issue Dec 8, 2021
* Allow-list known blocking methods from BlockHound

There are two known occurrences where the SDK may currently block from
within a Netty EventLoop:

1. #2145
2. #2360

Allowing BlockHound to forbid these operations may fail existing
integration and stability tests. While we have outstanding issues to fix
these items, until they are resolved, we need to allow our existing
integration tests to continue to pass. We should explicitly allow-list
these methods so that they do not interfere with existing tests and so
that we maintain visibility on future regression detection.
@yasminetalby yasminetalby added the p2 This is a standard priority issue label Nov 12, 2022
@bryan-rhm
Copy link

hello @debora-ito we are having the same issue, this is adding latency to our application while moving to EKS, is there any workaround? we are using credentials with roles + service accounts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

4 participants