Skip to content

Speeding up the shutdown of the driver. #576

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 29, 2019

Conversation

PetrJanouch
Copy link
Contributor

No description provided.

@PetrJanouch PetrJanouch force-pushed the 2.0_shutdown_speedup branch from ad351a6 to d9f378b Compare March 14, 2019 17:03
@pstutz
Copy link

pstutz commented Apr 9, 2019

Thanks for looking into this, this helps with our issue #571.

With a quiet period of 200ms the shutdown still takes about 1 second.
With a not-so-graceful quiet period of 0ms the shutdown is almost instantly.

What important work wouldn't get done if the shutdown wasn't graceful? Might it make sense to either set it to 0ms or to offer a shudownNow() method for users that can live with a possible drawback?

@PetrJanouch
Copy link
Contributor Author

@pstutz Unfortunately, we cannot set the quiet period to 0ms right now. It seems that there are code paths that can result to some close or clean up tasks being submitted to the event loop after shutdown is called on the event loop. I think the ideal solution is to make sure that calling shutdown on the event loop is really the absolutely last step in the shut down routine. We could set the quiet period to 0ms then. Unfortunately, this is not as easy as it sounds, because of the asynchronous and event driven nature of that code. It will require revisiting and carefully ordering the shutdown logic. We will see what we can do. shutdownNow() might result in some resources not being closed (the closing events not being processed by the event loop)

// This is an attempt to speed up the shut down procedure of the driver
// Feel free return this back to shutdownGracefully() method with default values
// if this proves troublesome!!!
eventLoopGroup().shutdownGracefully(200, 15_000, TimeUnit.MILLISECONDS);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do? Add a timeout? What happens if the timeout expires... what is the user supposed to do?

I am very cautious about optimising for a very niche use case. We need to optimise for the general case, wherein the driver shuts down at application shutdown. Here, everything must close; there is no other code path.

Copy link
Contributor Author

@PetrJanouch PetrJanouch Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A short explanation how the shutdownGracefully works. It has 2 parameters ‘quiet period’ and ‘timeout’. The algorithm that it uses is the following:

shutdownStart = now()
while (true) {
	Thread.sleep(quietPeriod)
	if (nothingWasSubmittedToTheEventLoopWhileSleeping()) {
           break
	}

	if (now() - shutdownStart > timeout) {
          // timeout
	  break
        }
}

The main purpose of this shutdown algorithm is to compete work that is in progress.
The default value for ‘quiet period’ are 2 seconds and 15 seconds for ‘timeout’

What do we need a quiet period for anyway? The current shutdown logic is like this:

channels.each (ch -> closeAsync())
shutdownEventLoopWithQuietPeriod()

The graceful shutdown with the quiet period is needed because closing a channel async generates events submitted to the event loop.

What we would need is something like this:

channels.each (ch -> closeAsync())
waitForAllChannelsToBeClosed()
shutdownEventLoopWithoutQuietPeriod()

As simple as the pseudo code above looks, the problem is that achieving waitForAllChannelsToBeClosed() with current Netty API is not that simple (at least was not the last time I checked)

The problem is that the shutdown currently takes 4s (the quiet period is usually reset once by the tasks generated by closing the channels) which is enough to make for instance lambda users or users with a lot of integration tests sad.

The reasoning behind my PR was that the defaults in Netty are set so it would work with all possible use cases the toolkit is used for, but 200ms quiet period is more that enough for us.

PetrJanouch and others added 2 commits April 26, 2019 14:30
OS has some delay to create or delete file. With a shorter shutdown time, the file might not deleted or created successfully.
@zhenlineo zhenlineo force-pushed the 2.0_shutdown_speedup branch from e62faca to f3312f9 Compare April 26, 2019 12:31
@zhenlineo zhenlineo merged commit 8dad6c4 into neo4j:2.0 Apr 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants