Skip to content

Graceful shutdown #249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 28, 2022
Merged

Conversation

DifferentialOrange
Copy link
Member

@DifferentialOrange DifferentialOrange commented Dec 16, 2022

If connected to Tarantool 2.10 or newer, after this patch a connection supports server graceful shutdown [1]. In this case, server will wait until all client requests will be finished and client disconnects before going down (server also may go down by timeout). Client reconnect will happen if connection options enable reconnect.

  1. https://www.tarantool.io/en/doc/latest/dev_guide/internals/iproto/graceful_shutdown/

I didn't forget about (remove if it is not applicable):

Closes #214

@DifferentialOrange DifferentialOrange changed the base branch from master to DifferentialOrange/gh-215-session-settings December 16, 2022 09:40
@DifferentialOrange DifferentialOrange force-pushed the DifferentialOrange/gh-214-graceful-shutdown branch from 26b47e2 to 5d2a7f0 Compare December 16, 2022 12:14
@DifferentialOrange DifferentialOrange force-pushed the DifferentialOrange/gh-215-session-settings branch from cdf2abe to 7665568 Compare December 16, 2022 12:16
@DifferentialOrange DifferentialOrange force-pushed the DifferentialOrange/gh-214-graceful-shutdown branch 2 times, most recently from 78d67b5 to cca3f97 Compare December 16, 2022 12:18
@DifferentialOrange DifferentialOrange force-pushed the DifferentialOrange/gh-214-graceful-shutdown branch from cca3f97 to 637be3b Compare December 16, 2022 12:41
@DifferentialOrange DifferentialOrange force-pushed the DifferentialOrange/gh-214-graceful-shutdown branch from 637be3b to d8b8196 Compare December 16, 2022 14:12
@DifferentialOrange DifferentialOrange force-pushed the DifferentialOrange/gh-214-graceful-shutdown branch 2 times, most recently from 68719cb to 47f524a Compare December 20, 2022 14:06
@DifferentialOrange DifferentialOrange force-pushed the DifferentialOrange/gh-214-graceful-shutdown branch 4 times, most recently from 10e7657 to ebc323c Compare December 21, 2022 08:52
@DifferentialOrange
Copy link
Member Author

For now I've got an issue that not all requests started with Do are ensured to get executed. It's more like a Tarantool behavior rather than we did something wrong.

@DifferentialOrange
Copy link
Member Author

Well, it seems that it is still not working: I've got TestGracefulShutdownRespectsClose dead stuck once and pool tests are stuck too.

@DifferentialOrange
Copy link
Member Author

Well, it seems that it is still not working: I've got TestGracefulShutdownRespectsClose dead stuck once and pool tests are stuck too.

Seems like the issue was again an concurrent Unregister in connectionClose and shutdown. Started to Unregister in separate goroutine, this should solve the issue.

@DifferentialOrange DifferentialOrange force-pushed the DifferentialOrange/gh-214-graceful-shutdown branch from ebc323c to c43e268 Compare December 21, 2022 11:25
Base automatically changed from DifferentialOrange/gh-215-session-settings to master December 21, 2022 12:05
@DifferentialOrange DifferentialOrange force-pushed the DifferentialOrange/gh-214-graceful-shutdown branch from 86037de to e4e7a4f Compare December 21, 2022 12:55
@DifferentialOrange
Copy link
Member Author

For now I've got an issue that not all requests started with Do are ensured to get executed. It's more like a Tarantool behavior rather than we did something wrong.

Added simple notify channel to track shutdown callback events if one wish to do so.

@DifferentialOrange
Copy link
Member Author

DifferentialOrange commented Dec 21, 2022

There still seems to be some rare concurrency issues (with shutdown notify channel)

@DifferentialOrange DifferentialOrange force-pushed the DifferentialOrange/gh-214-graceful-shutdown branch 2 times, most recently from 7946786 to 8e8846a Compare December 21, 2022 15:17
@DifferentialOrange
Copy link
Member Author

I've re-checked one more time and we definitely cannot replace inst.Cmd.Process.Signal(syscall.SIGTERM) with inst.Cmd.Process.Kill() since it sends SIGKILL which kills a Tarantool mercilessly without allowing to handle shutdown triggers.

@DifferentialOrange
Copy link
Member Author

The last known bug related to

--- FAIL: TestGracefulShutdownConcurrent (1.15s)
    shutdown_test.go:541: 
        	Error Trace:	shutdown_test.go:541
        	Error:      	Expected nil, but got: tarantool.ClientError{Code:0x4001, Msg:"connection closed after server shutdown"}
        	Test:       	TestGracefulShutdownConcurrent
        	Messages:   	No errors on concurrent wait

was caused by if conn.shutdownWatcher != nil { conditions: shutdown watcher requests itself could be sent BEFORE conn.shutdownWatcher is set, so counter is not incremented, but may finish AFTER conn.shutdownWatcher is set, so counter will be decremented and became negative. I don't think it's a big perf issue so I left them unconditional.

@DifferentialOrange DifferentialOrange force-pushed the DifferentialOrange/gh-214-graceful-shutdown branch 2 times, most recently from 1d71195 to aac4200 Compare December 21, 2022 20:30
@DifferentialOrange
Copy link
Member Author

DifferentialOrange commented Dec 21, 2022

So it seems that everything we wanted to solve is solved now.

@DifferentialOrange DifferentialOrange marked this pull request as ready for review December 21, 2022 20:34
@DifferentialOrange DifferentialOrange force-pushed the DifferentialOrange/gh-214-graceful-shutdown branch 2 times, most recently from 55e2e30 to fe07889 Compare December 25, 2022 18:40
Copy link
Collaborator

@oleg-jukovec oleg-jukovec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but there are a couple of comments that need to be resolved.

@oleg-jukovec oleg-jukovec requested a review from AnaNek December 26, 2022 07:44
@DifferentialOrange DifferentialOrange force-pushed the DifferentialOrange/gh-214-graceful-shutdown branch from fe07889 to 2f2d4a9 Compare December 26, 2022 08:18
After this patch, "make test" behaves similar to "make test-*".

Part of #214
Before this patch, calling StopTarantool wasn't idempotent because it
accepts a struct copy and doesn't actually set Cmd to nil. Setting
Cmd.Process to nil is effective since it's a pointer. Reworking helpers
to use pointer would be better, but it would break existing API.
If connected to Tarantool 2.10 or newer, after this patch a connection
supports server graceful shutdown [1]. In this case, server will wait
until all client requests will be finished and client disconnects before
going down (server also may go down by timeout). Client reconnect will
happen if connection options enable reconnect. Beware that graceful
shutdown event initialization is asynchronous.

1. https://www.tarantool.io/en/doc/latest/dev_guide/internals/iproto/graceful_shutdown/

Closes #214
@oleg-jukovec oleg-jukovec force-pushed the DifferentialOrange/gh-214-graceful-shutdown branch from 2f2d4a9 to 4eb8ac8 Compare December 28, 2022 09:00
@oleg-jukovec
Copy link
Collaborator

oleg-jukovec commented Dec 28, 2022

I rebased the branch because I want to merge the pull request today. I also removed the code from connection.go:

    // future here does not belong to any shard yet,
    // so cancelFuture don't call markDone.
    conn.decrementRequestCnt()

because a future actually belongs to a shard after successful newFuture().

@oleg-jukovec oleg-jukovec merged commit 13b9f8e into master Dec 28, 2022
@oleg-jukovec oleg-jukovec deleted the DifferentialOrange/gh-214-graceful-shutdown branch December 28, 2022 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support graceful shutdown
3 participants