-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix: localmode subprocess parent process not sending SIGTERM to child #2613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: localmode subprocess parent process not sending SIGTERM to child #2613
Conversation
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
268357f
to
7b07b90
Compare
@ahsan-z-khan This PR doesn't contain and new dependencies |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
@shreyapandit Hi! can you have a look if the failing tests are relevant to the PR? And assign someone to review it? |
@gilinachum any updates? |
Sorry about that. Checking again :\ |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
@@ -841,6 +840,8 @@ def run(self): | |||
|
|||
def down(self): | |||
"""Placeholder docstring""" | |||
if os.name != "nt": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this check for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's checking if the os is a windows machine, since I'm using unix commands to kill the processes
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Hi @ranzvi, thanks for your contribution! Please follow the one time setup for enabling git hooks, and this should allow you to reproduce the errors, and fix the tests cc @ranzvi @gilinachum. The unit tests are failing on linting issues.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As outlined in comments
…zvi/sagemaker-python-sdk into fix-localmode-subprocess-termination
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
@shreyapandit Thanks for the insights. I fixed the linting issues and executed the unittest locally and they pass. Do you have any idea why the rest of the ci is still failing? |
@ranzvi I am restarting your tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for your contribution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me.
…aws#2613) Co-authored-by: Shreya Pandit <[email protected]> Co-authored-by: Jeniya Tabassum <[email protected]>
@ranzvi did this fix your issue? Interestingly, this new code introduces an error log for us during
Commenting out these lines fixes our issue. For us, I'd be interesting in potentially rolling back these changes if that was OK with you. |
@wcarpenter1-godaddy This did fix our issue, we had a problem with the |
@ranzvi what os are you on? |
@wcarpenter1-godaddy MacOS, but this works on |
Issue #, if available:
Description of changes:
Due to an unknown issue, on some machines executing
predictor.delete_endpoint()
simply hangs.This is because the
SIGTERM
sent to the subprocess doesn't propagate to the child process.E.g pid
7521
doesn't propagateSIGTERM
to child7522
.This means that
self.container.join()
is waiting for a container which hasn't received a signal to stop execution and hangs indefinitely.By sending
SIGTERM
to all child processes of thesubprocess.Popen
process, we can guarantee that the container will be Gracefully stopped...Testing done:
Executed all unit and integration tests
Verified behaviour remains the same on machines without the bug
Verified bug fixed on faulty machines by executing multiple endpoints sequentially
Testing done:
Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.General
Tests
unique_name_from_base
to create resource names in integ tests (if appropriate)By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.