-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix: hyperparameter tuning with spot instances and checkpoints #1015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: hyperparameter tuning with spot instances and checkpoints #1015
Conversation
I notice a flake8 error that there are too many arguments for the session.tune method. What should I do? |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
I misunderstood. The cause of the error is not too many arguments, but too complex. I fixed to suppress the error in the same way as session.train method does. |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lack of integration test proving this change works worries me, however the precedent didn't seem to do it either, as shown here: #990
Can you comment confirming that your change works as intended?
@@ -450,6 +450,9 @@ def tune( | |||
early_stopping_type="Off", | |||
encrypt_inter_container_traffic=False, | |||
vpc_config=None, | |||
train_use_spot_instances=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we allow the user to provide a train_max_wait similar to the Estimator call?
train_max_wait=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to pass train_max_wait
to HyperparameterTuner. It has already been passed as stop_condition
from the estimator of a HyperparameterTuner instance.
sagemaker-python-sdk/src/sagemaker/job.py
Line 85 in b33d1a2
estimator.train_max_run, estimator.train_max_wait |
This is called from here:
sagemaker-python-sdk/src/sagemaker/tuner.py
Line 853 in b33d1a2
config = _Job._load_config(inputs, tuner.estimator) |
Hi @ChoiByungWook, thanks for your comment.
In my several tuning jobs, I have confirmed that this change is working correctly. |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
thanks for your contribution! |
Issue #, if available:
#1011
Description of changes:
Add parameters (
train_use_spot_instances
,checkpoint_s3_uri
, andcheckpoint_local_path
) to the request of hyperparameter tuning jobs as it did for training jobs in #990. These parameters are from an estimator.Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.