Skip to content

Blocking call inside of a software.amazon.awssdk.http.nio.netty.internal.BetterSimpleChannelPool.close #2145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
lanwen opened this issue Nov 13, 2020 · 4 comments
Labels
feature-request A feature should be added or improved. p3 This is a minor priority issue

Comments

@lanwen
Copy link

lanwen commented Nov 13, 2020

Since v2 is claimed to be non-blocking, we still facing a blocking call in software.amazon.awssdk.http.nio.netty.internal.BetterSimpleChannelPool#close which could be concidered as illegal in this case.
We use https://github.com/reactor/BlockHound in pair with Reactor in spring as well as localstack in testcontainers in our tests, where this blocking call is popping out.

Describe the issue

We see in test logs this stacktrace:

2020-11-13 11:15:00.917  WARN 63962 --- [entExecutor-2-1] i.n.util.concurrent.GlobalEventExecutor  : Unexpected exception from the global event executor: 

reactor.blockhound.BlockingOperationError: Blocking call! java.lang.Object#wait
        at java.base/java.lang.Object.wait(Object.java)
        at java.base/java.lang.Object.wait(Object.java:321)
        at io.netty.util.concurrent.DefaultPromise.awaitUninterruptibly(DefaultPromise.java:274)
        at io.netty.channel.DefaultChannelPromise.awaitUninterruptibly(DefaultChannelPromise.java:137)
        at io.netty.channel.DefaultChannelPromise.awaitUninterruptibly(DefaultChannelPromise.java:30)
        at io.netty.channel.pool.SimpleChannelPool.close(SimpleChannelPool.java:402)
        at software.amazon.awssdk.http.nio.netty.internal.BetterSimpleChannelPool.close(BetterSimpleChannelPool.java:38)
        at software.amazon.awssdk.http.nio.netty.internal.HonorCloseOnReleaseChannelPool.close(HonorCloseOnReleaseChannelPool.java:75)
        at software.amazon.awssdk.http.nio.netty.internal.IdleConnectionCountingChannelPool.close(IdleConnectionCountingChannelPool.java:106)
        at software.amazon.awssdk.http.nio.netty.internal.utils.BetterFixedChannelPool.lambda$close0$3(BetterFixedChannelPool.java:391)
        at io.netty.util.concurrent.GlobalEventExecutor$TaskRunner.run(GlobalEventExecutor.java:243)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.base/java.lang.Thread.run(Thread.java:832)

2020-11-13 11:15:05.917 ERROR 63962 --- [extShutdownHook] s.a.a.h.n.netty.NettyNioAsyncHttpClient  : Unable to close channel pools

Steps to Reproduce

It's hard actually to reproduce the issue, as it's visible or affecting test execution not 100% of the times.

However we're doing in tests something really simple like

var client = DynamoDbEnhancedAsyncClient.builder().dynamoDbClient(DynamoDbAsyncClient.builder().build()).build();
dynamoDb.table("table-name", TableSchema.fromBean(Content.class)).createTable()/.query()

within the test. Later when the context closes (we defined clients as spring beans), the close method seems to be called.

Current Behavior

BetterSimpleChannelPool#close blocks the thread

Your Environment

  • AWS Java SDK version used: 2.15.26
  • JDK version used: 1.14
  • Operating System and version: any

Current workaround is to whitelist this in the BlockHound

.allowBlockingCallsInside(
                        "software.amazon.awssdk.http.nio.netty.internal.BetterSimpleChannelPool",
                        "close"
                )
@lanwen lanwen added guidance Question that needs advice or information. needs-triage This issue or PR still needs to be triaged. labels Nov 13, 2020
@zoewangg
Copy link
Contributor

Hi @lanwen, thank you for reaching out and sorry for the delayed response. The close method in the Netty class SimpleChannelPool is a blocking call. Although it has closeAsync method available, we think using the blocking close here is better because this way it can ensure the resources are properly cleaned up or if there is any error preventing it from being closed, customers are aware of it and can handle it to avoid potential resource leak. I agree the blocking close method could be a bit surprising for an async client, and we can update the Javadocs to make it more clear.

Please let us know if you have further questions.

@zoewangg zoewangg added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. and removed needs-triage This issue or PR still needs to be triaged. labels Dec 22, 2020
@lanwen
Copy link
Author

lanwen commented Dec 23, 2020

@zoewangg thank you for the reply. However, I would like to know if there is an option to switch to a fully non-blocking behaviour, as right now if the error occurs during the close, I can't really handle it properly - same way as with async, as I don't fully own the execution loop.

But with a reactive application with a limited number of threads blocking operation here could lead to more serious issues with a service - like full unavailability during the blocking call.

Also I can see, that closeAsync returns Feature, which should keep an exception (in theory) if something goes wrong. May I ask why this path wasn't chosen?

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Dec 24, 2020
@zoewangg
Copy link
Contributor

zoewangg commented Jan 5, 2021

Thanks for the clarification. Actually, SimpleChannelPool#close had been nonblocking but got changed to blocking to fix some race condition in this commit, which introduced other issues, so we had to implement some workaround in the SDK. closeAsync was added after we implemented the workaround. The reason why the SDK awaits eventloopgroup and channel pool to be closed is because the close method in the AutoClosable doesn't return anything, which means there's no way for notify customers if the resources fail to be released even if we call closeAsync under the hood.

We could consider introducing a nonblocking close method in the NettyNioAsyncHttpClient that returns a CompletableFuture. Marking this as a feature request.

@zoewangg zoewangg added feature-request A feature should be added or improved. and removed guidance Question that needs advice or information. labels Jan 5, 2021
Bennett-Lynch pushed a commit to Bennett-Lynch/aws-sdk-java-v2 that referenced this issue Dec 7, 2021
There are two known occurrences where the SDK may currently block from
within a Netty EventLoop:

1. aws#2145
2. aws#2360

Allowing BlockHound to forbid these operations may fail existing
integration and stability tests. While we have outstanding issues to fix
these items, until they are resolved, we need to allow our existing
integration tests to continue to pass. We should explicitly allow-list
these methods so that they do not interfere with existing tests and so
that we maintain visibility on future regression detection.
Bennett-Lynch added a commit that referenced this issue Dec 8, 2021
* Allow-list known blocking methods from BlockHound

There are two known occurrences where the SDK may currently block from
within a Netty EventLoop:

1. #2145
2. #2360

Allowing BlockHound to forbid these operations may fail existing
integration and stability tests. While we have outstanding issues to fix
these items, until they are resolved, we need to allow our existing
integration tests to continue to pass. We should explicitly allow-list
these methods so that they do not interfere with existing tests and so
that we maintain visibility on future regression detection.
@ssalamov
Copy link

Hello,

+1

In our system we also experience this issue. As now we whitelisted it in block hound.

Any idea what is a plan (ETA) for fixing it? Thank you.

Regards,
Samir Salamov

aws-sdk-java-automation added a commit that referenced this issue Sep 14, 2022
…e14ffdfa9

Pull request: release <- staging/abec8754-2103-4749-adab-a43e14ffdfa9
@yasminetalby yasminetalby added the p3 This is a minor priority issue label Nov 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved. p3 This is a minor priority issue
Projects
None yet
Development

No branches or pull requests

4 participants