Skip to content

fix: Remove sagemaker_job_name from hyperparameters in TrainingStep #2950

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 3, 2022

Conversation

staubhp
Copy link
Contributor

@staubhp staubhp commented Feb 22, 2022

Issue #, if available:
2940

Description of changes:
Some framework classes add a sagemaker_job_name hyperparameter with a dynamic training job name. This is both inaccurate (the job name will be set when the step runs) and it breaks caching for those steps.

Testing done:
Unit, manual

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the CONTRIBUTING doc
  • I certify that the changes I am introducing will be backword compatible, and I have discussed concerns about this, if any, with the Python SDK team
  • I used the commit message format described in CONTRIBUTING
  • I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
  • I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
  • I have checked that my tests are not configured for a specific region or account (if appropriate)
  • I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov-commenter
Copy link

codecov-commenter commented Feb 23, 2022

Codecov Report

Merging #2950 (2dfde22) into dev (086258d) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##              dev    #2950   +/-   ##
=======================================
  Coverage   89.80%   89.80%           
=======================================
  Files         196      196           
  Lines       16563    16565    +2     
=======================================
+ Hits        14875    14877    +2     
  Misses       1688     1688           
Impacted Files Coverage Δ
src/sagemaker/workflow/steps.py 97.81% <100.00%> (+0.01%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 086258d...2dfde22. Read the comment docs.

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 298f965
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 298f965
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 298f965
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@staubhp staubhp force-pushed the fix-training-step-caching branch from 298f965 to 6ff0c7a Compare March 2, 2022 18:34
@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 6ff0c7a
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: dbddcc2
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 6ff0c7a
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 6ff0c7a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: dbddcc2
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@@ -301,6 +301,7 @@ def arguments(self) -> RequestType:
)
request_dict = self.estimator.sagemaker_session._get_train_request(**train_args)
request_dict.pop("TrainingJobName")
request_dict["HyperParameters"].pop("sagemaker_job_name", None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wonder if it is possible that this request dict does not have HyperParameters in it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will always be there, but let me add a safety check

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: dbddcc2
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: dbddcc2
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: dbddcc2
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 2dfde22
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 2dfde22
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 2dfde22
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Contributor

@shreyapandit shreyapandit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in comments

@shreyapandit shreyapandit merged commit 20df3d7 into aws:dev Mar 3, 2022
shreyapandit pushed a commit that referenced this pull request Mar 4, 2022
jeniyat added a commit that referenced this pull request Mar 18, 2022
* change: update code to get commit_id in codepipeline (#2961)

* feature: Data Serializer (#2956)

* change: reorganize test files for workflow (#2960)

Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Jeniya Tabassum <[email protected]>
Co-authored-by: Dewen Qi <[email protected]>

* feature: TensorFlow 2.4 for Neo (#2861)

Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Jeniya Tabassum <[email protected]>

* fix: Remove sagemaker_job_name from hyperparameters in TrainingStep (#2950)

Co-authored-by: Payton Staub <[email protected]>

* fix: Style update in DataSerializer (#2962)

* documentation: smddp doc update (#2968)

* fix: container env generation for S3 URI and add test for the same (#2971)

* documentation: update sagemaker training compiler docstring (#2969)

* feat: Python 3.9 for readthedocs (#2973)

Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Jeniya Tabassum <[email protected]>
Co-authored-by: Dewen Qi <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: qidewenwhen <[email protected]>
Co-authored-by: Qingzi-Lan <[email protected]>
Co-authored-by: Payton Staub <[email protected]>

* fix doc structure

* archive 1.6.0 doc

* add new args, refs, and links

* fix version number

* incorp eng feedback, update docstrings, improve xref

* Trigger Build

* minor fix, trigger build again

* fix typo

Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Jeniya Tabassum <[email protected]>
Co-authored-by: qidewenwhen <[email protected]>
Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Dewen Qi <[email protected]>
Co-authored-by: Qingzi-Lan <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Shreya Pandit <[email protected]>
Co-authored-by: Ahsan Khan <[email protected]>
Co-authored-by: Mufaddal Rohawala <[email protected]>
ahsan-z-khan added a commit to ahsan-z-khan/sagemaker-python-sdk that referenced this pull request Mar 23, 2022
* change: update code to get commit_id in codepipeline (aws#2961)

* feature: Data Serializer (aws#2956)

* change: reorganize test files for workflow (aws#2960)

Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Jeniya Tabassum <[email protected]>
Co-authored-by: Dewen Qi <[email protected]>

* feature: TensorFlow 2.4 for Neo (aws#2861)

Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Jeniya Tabassum <[email protected]>

* fix: Remove sagemaker_job_name from hyperparameters in TrainingStep (aws#2950)

Co-authored-by: Payton Staub <[email protected]>

* fix: Style update in DataSerializer (aws#2962)

* documentation: smddp doc update (aws#2968)

* fix: container env generation for S3 URI and add test for the same (aws#2971)

* documentation: update sagemaker training compiler docstring (aws#2969)

* feat: Python 3.9 for readthedocs (aws#2973)

Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Jeniya Tabassum <[email protected]>
Co-authored-by: Dewen Qi <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: qidewenwhen <[email protected]>
Co-authored-by: Qingzi-Lan <[email protected]>
Co-authored-by: Payton Staub <[email protected]>

* fix doc structure

* archive 1.6.0 doc

* add new args, refs, and links

* fix version number

* incorp eng feedback, update docstrings, improve xref

* Trigger Build

* minor fix, trigger build again

* fix typo

Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Jeniya Tabassum <[email protected]>
Co-authored-by: qidewenwhen <[email protected]>
Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Dewen Qi <[email protected]>
Co-authored-by: Qingzi-Lan <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Shreya Pandit <[email protected]>
Co-authored-by: Ahsan Khan <[email protected]>
Co-authored-by: Mufaddal Rohawala <[email protected]>
jerrypeng7773 pushed a commit to jerrypeng7773/sagemaker-python-sdk that referenced this pull request Apr 19, 2022
jerrypeng7773 pushed a commit to jerrypeng7773/sagemaker-python-sdk that referenced this pull request May 13, 2022
* change: update code to get commit_id in codepipeline (aws#2961)

* feature: Data Serializer (aws#2956)

* change: reorganize test files for workflow (aws#2960)

Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Jeniya Tabassum <[email protected]>
Co-authored-by: Dewen Qi <[email protected]>

* feature: TensorFlow 2.4 for Neo (aws#2861)

Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Jeniya Tabassum <[email protected]>

* fix: Remove sagemaker_job_name from hyperparameters in TrainingStep (aws#2950)

Co-authored-by: Payton Staub <[email protected]>

* fix: Style update in DataSerializer (aws#2962)

* documentation: smddp doc update (aws#2968)

* fix: container env generation for S3 URI and add test for the same (aws#2971)

* documentation: update sagemaker training compiler docstring (aws#2969)

* feat: Python 3.9 for readthedocs (aws#2973)

Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Jeniya Tabassum <[email protected]>
Co-authored-by: Dewen Qi <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: qidewenwhen <[email protected]>
Co-authored-by: Qingzi-Lan <[email protected]>
Co-authored-by: Payton Staub <[email protected]>

* fix doc structure

* archive 1.6.0 doc

* add new args, refs, and links

* fix version number

* incorp eng feedback, update docstrings, improve xref

* Trigger Build

* minor fix, trigger build again

* fix typo

Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Jeniya Tabassum <[email protected]>
Co-authored-by: qidewenwhen <[email protected]>
Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Dewen Qi <[email protected]>
Co-authored-by: Qingzi-Lan <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Shreya Pandit <[email protected]>
Co-authored-by: Ahsan Khan <[email protected]>
Co-authored-by: Mufaddal Rohawala <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants