Skip to content

Awaits EventLoopGroup Shutdown when closing Netty client #1069

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 12, 2019

Conversation

zoewangg
Copy link
Contributor

@zoewangg zoewangg commented Feb 8, 2019

Description

Awaits EventLoopGroup#shutdownGracefully to complete when closing Netty client.

Motivation and Context

Fun fact: EventLoopGroup#shutdownGracefully is also asynchronous.

Testing

unit tests and manually triggered smart integ tests.

Screenshots (if appropriate)

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)

Checklist

  • I have read the CONTRIBUTING document
  • Local run of mvn install succeeds
  • My code follows the code style of this project
  • My change requires a change to the Javadoc documentation
  • I have updated the Javadoc documentation accordingly
  • I have read the README document
  • I have added tests to cover my changes
  • All new and existing tests passed
  • A short description of the change has been added to the CHANGELOG
  • My change is to implement 1.11 parity feature and I have updated LaunchChangelog

License

  • I confirm that this pull request can be released under the Apache 2 license

}
return CompletedCloseFuture.COMPLETED_FUTURE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use SucceededFuture with a null executor instead of creating your own future implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I considered using that but was too afraid to pass two nulls to the constructor. I wish they have an overloaded constructor. Will update to use that instead.

*
* @param promise the promise to complete
*/
public static void syncInterruptibly(Future<?> promise) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't that much different than netty's awaitUninterruptibly(). Any reason we care about re-throwing the interrupt exception if we're just going to log it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per Javadoc of awaitUniterrupttibly,

Waits for this future to be completed within the specified time limit without interruption. This method catches an {@link InterruptedException} and discards it silently.

it doesn't re-interrupt the thread and that's why I created this method to re-interrupt the thread.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does re-interrupt the thread, if you look at the implementation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh yeah, it does. I will update. I was looking at the implementation of an overloaded method of awaitUninterruptibly and from first look, it looks like it doesn't re-interrupt the method.

    @Override
    public boolean awaitUninterruptibly(long timeout, TimeUnit unit) {
        try {
            return await0(unit.toNanos(timeout), false);
        } catch (InterruptedException e) {
            // Should not be raised at all.
            throw new InternalError();
        }
    }

@@ -212,7 +214,8 @@ private SdkEventLoopGroup nonManagedEventLoopGroup(SdkEventLoopGroup eventLoopGr
@Override
public void close() {
runAndLogError(log, "Unable to close channel pools", pools::close);
runAndLogError(log, "Unable to shutdown event loop", sdkEventLoopGroup.eventLoopGroup()::shutdownGracefully);
Future<?> closePromise = sdkEventLoopGroup.eventLoopGroup().shutdownGracefully();
runAndLogError(log, "Unable to shutdown event loop", () -> syncInterruptibly(closePromise));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do the acquiring of the closePromise within the runAndLogError as well so that close() won't throw any exceptions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@zoewangg zoewangg force-pushed the zoewang-awaitsEventLoopGroupShutdown branch from b26402a to 6c61b4a Compare February 8, 2019 20:56
@zoewangg
Copy link
Contributor Author

zoewangg commented Feb 8, 2019

ConnectionReaperTest seems to be stuck with the changes. Taking a look

@zoewangg
Copy link
Contributor Author

zoewangg commented Feb 9, 2019

Okay, this time NettyNioAsyncHttpClientWireMockTest got stuck, though it passed on my machine and JDK10 CodeBuild job somehow. :(

@zoewangg zoewangg force-pushed the zoewang-awaitsEventLoopGroupShutdown branch 2 times, most recently from 482ff8a to 598da7a Compare February 9, 2019 01:05
@codecov-io
Copy link

codecov-io commented Feb 9, 2019

Codecov Report

Merging #1069 into master will decrease coverage by <.01%.
The diff coverage is 66.66%.

Impacted file tree graph

@@             Coverage Diff             @@
##             master   #1069      +/-   ##
===========================================
- Coverage      56.2%   56.2%   -0.01%     
- Complexity     4694    4698       +4     
===========================================
  Files           824     824              
  Lines         28027   28039      +12     
  Branches       2257    2257              
===========================================
+ Hits          15753   15759       +6     
- Misses        11543   11549       +6     
  Partials        731     731
Impacted Files Coverage Δ Complexity Δ
...tp/nio/netty/internal/SharedSdkEventLoopGroup.java 100% <100%> (ø) 6 <0> (ø) ⬇️
...ssdk/http/nio/netty/internal/utils/NettyUtils.java 25.8% <100%> (+2.47%) 4 <1> (+1) ⬆️
...p/nio/netty/internal/NonManagedEventLoopGroup.java 100% <100%> (ø) 2 <1> (ø) ⬇️
...awssdk/http/nio/netty/NettyNioAsyncHttpClient.java 65% <45.45%> (-1.93%) 20 <2> (+2)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a8a49f7...d5b79a9. Read the comment docs.

@zoewangg zoewangg force-pushed the zoewang-awaitsEventLoopGroupShutdown branch 3 times, most recently from 6235c86 to b49a0f5 Compare February 9, 2019 01:47
@zoewangg
Copy link
Contributor Author

zoewangg commented Feb 9, 2019

Changing from

sdkEventLoopGroup.eventLoopGroup().shutdownGracefully().awaitUninterruptibly()

to

sdkEventLoopGroup.eventLoopGroup().shutdownGracefully().awaitUninterruptibly(3000) fixed it.

I compared the source code of DefaultPromise#awaitUninterruptibly with DefaultPromise#awaitUninterruptibly(long) and they are not exactly the same. There is definitely some race condition in DefaultPromise#awaitUninterruptibly

@zoewangg zoewangg force-pushed the zoewang-awaitsEventLoopGroupShutdown branch 3 times, most recently from 5bd5283 to 35b7e61 Compare February 11, 2019 19:08
@zoewangg
Copy link
Contributor Author

I updated to use eventLoopGroup.shutdownGracefully().get(3000, TimeUnit.MILLISECONDS) instead. It would throw exception if shutdownGracefully fails.

DefaultPromise#get invokes DefaultPromise#await. From the testing, awaitUninterruptibly is buggy, but await is working fine.

I think timeout is very necessary as it's extremely difficult to debug which process is the culprit when the application gets stuck.

private void closeEventLoopUninterruptibly(EventLoopGroup eventLoopGroup) throws
TimeoutException, ExecutionException {
try {
eventLoopGroup.shutdownGracefully().get(3000, TimeUnit.MILLISECONDS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shutdownGracefully has a timeout passed to it as well, that by default is longer than 3 seconds. Should we align these values in some way? Can we also catch errors we think can happen sometimes (eg. TimeoutException) to log a less scary error, if the customer doesn't need to worry about them?

Copy link
Contributor Author

@zoewangg zoewangg Feb 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we align these values in some way?

+1, it took me a while to understand the javadoc of #shutdownGracefully(long, long, TimeUnit). The default timeout is 15 seconds though and I will update it to more than 15 seconds.

https://github.com/netty/netty/blob/fa6a8cb09c9679468a6c2d912ddfbbe885ee0c08/common/src/main/java/io/netty/util/concurrent/AbstractEventExecutor.java#L37

Can we also catch errors we think can happen sometimes (eg. TimeoutException) to log a less scary error, if the customer doesn't need to worry about them?

Hmm, I do think customers need to be aware of the issue though as unclosed eventloop can cause resource leak. What do you think of changing the log level to warn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, do you mean just catching TimeoutException and changing the error message? Yeah, we can do that. I'll update it

Copy link
Contributor

@millems millems left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor nit.


private void closeEventLoopUninterruptibly(EventLoopGroup eventLoopGroup) throws ExecutionException {
try {
eventLoopGroup.shutdownGracefully().get(EVENTLOOP_SHUTDOWN_TIMEOUT_SECONDS, TimeUnit.SECONDS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we pass the same(ish) value to shutdownGracefully, to make it clear why this value was chosen? Worried a future dev might not understand why this particular value is chosen and lessen it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, updated!

@zoewangg zoewangg force-pushed the zoewang-awaitsEventLoopGroupShutdown branch from bec2d9b to d5b79a9 Compare February 12, 2019 00:19
@zoewangg zoewangg merged commit ee3bb00 into master Feb 12, 2019
@zoewangg zoewangg deleted the zoewang-awaitsEventLoopGroupShutdown branch February 12, 2019 01:29
aws-sdk-java-automation pushed a commit that referenced this pull request Dec 4, 2020
…6ec1651cd

Pull request: release <- staging/8a15f3ae-efe8-4560-9dcf-db46ec1651cd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants