-
Notifications
You must be signed in to change notification settings - Fork 132
Terminate on SIGTERM #217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Terminate on SIGTERM #217
Conversation
Closes jupyterhub#215 by listening for a SIGTERM signal.
In general faster shutdowns/reaction to instructions is a good thing. Why would the CHP be "unresponsive" in these 30s? I assumed it would just keep running, the I don't see a reason for us not to already quit on SIGTERM. It would be great if we could switch into a "connection draining" mode or otherwise stop accepting new connections but still serve existing ones. However I think this gets us into full blown HA proxy land. Education: I didn't have the k8s "pod deletion" sequence of signals ready in my head, so I googled and found https://pracucci.com/graceful-shutdown-of-kubernetes-pods.html. This mentions SIGTERM as the first signal that gets sent and then 30s (or "configurable grace period") later it sends SIGKILL. Where does SIGINT fit in? This suggests that SIGINT is what a user sends when they press Ctrl-C. |
Regarding unresponsive binderhub: I recall there was a time when the pods on a node stopped getting traffic from the kubernetes service because they were located on a cordoned node or similar. Wasnt it so? Or is it only terminating pods? This pod will be terminating for 30 seconds due to the toleration grace periods defaulting to that, so if the proxy needs to relocate it will at least take those 30 during its terminating sequence which starts by SIGTERM to its containers i think. I dont know if we can shut down more gracefully than this, im not at all read up on chp. |
I don't think so, but that is a guess. Should be a new PR/issue though and not for this PR. We do have the problem that a Merging now because I think this is good as it is. |
I don't understand why it's not terminating on SIGTERM already. This is the default behavior for a process. Has anyone confirmed that this change actually accomplishes shutdown on SIGTERM where it wasn't occurring before? I can locally verify that earlier releases of CHP exit on SIGTERM already. |
@minrk I didn't know 100% i tested for SIGTERM specifically, but i tested To reproduce the different behavior with I read this about |
@minrk I believe SIGTERM is passed to a process as some kind of input, and as SIGTERM is a friendly "please terminate" it can be ignored while SIGKILL is managed by the kernel that will simply kill the process no matter what it sais. This is my current understanding. I assume the node's process does not itself handle the SIGTERM signal and require us to consider it. I'm unexperienced in this domain, both Linux termination signals and Node stuff though. |
You're absolutely right! I had assumed that node.js behaves like ~every other program, where SIGTERM results in shutdown by default (You will see this with any Python or C program) and a SIGTERM handler is only used to register additional actions prior to a clean shutdown. But indeed SIGTERM is ignored by default in node.js, which is very unusual. Lessons I've learned here:
|
Closes #215 by listening for a SIGTERM signal.
This has been tested to make a difference by issuing a
docker stop
command as described in #215. It now shuts down very quickly while before it took the full timeout duration for it to shutdown, which I interpret as it required a SIGKILL signal to be sent for it to react at all.I wrote out an exit code of 2 like for SIGINT, I'm not sure what makes sense here or if it will matter anywhere.
Having this merged and released could impact Z2JH / BinderHub performance during upgrades and when draining nodes etc, making a BinderHub not go inresponsive for 30 seconds I believe.