Skip to content

Fix FrameworkProcessor for SageMaker Pipelines #2664

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 20 commits into from

Conversation

athewsey
Copy link
Collaborator

@athewsey athewsey commented Sep 24, 2021

Issue #, if available: #2656, #2763

Description of changes:

A slimmed-down alternative to #2633, reducing unnecessary change to ScriptProcessor to try and speed up acceptance so we can defer longer-term refactoring proposals to later.

  • Fix FrameworkProcessor-based processors' interactions with SageMaker Pipelines ProcessingStep (which currently don't work together at all per the linked issue)
  • Enable source_dir via ProcessingStep, so Pipelines users can take full advantage of the FrameworkProcessor functionality.
  • Fix pass-through of KMS configuration from FrameworkProcessor to the underlying Framework Estimator.
  • Re-align FrameworkProcessor from /bin/bash to /bin/sh: Since some optimized, container-oriented Linux distributions (e.g. Alpine) may not include bash.
  • Update the dev guide docs, which were never updated to include the FrameworkProcessor classes in API tables of contents or mention available frameworks in the overview.
  • (2022-01-27 update) Add code_location parameter to ScriptProcessor (including SKLearnProcessor) as well, to enable users of these processors to avoid using the SageMaker default bucket.

Testing done:

  • Added unit test to prove the fix (that _normalize_args as called by Pipelines should fully prepare a FrameworkProcessor)
  • Adjusted unit tests to further exercise FrameworkProcessor on other frameworks
  • Adjusted pipelines integration test for FrameworkProcessor to actually check the job is able to run and succeed.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the CONTRIBUTING doc
  • I certify that the changes I am introducing will be backword compatible, and I have discussed concerns about this, if any, with the Python SDK team
  • I used the commit message format described in CONTRIBUTING
  • I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
  • I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
  • I have checked that my tests are not configured for a specific region or account (if appropriate)
  • I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 5ba98db
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@verdimrc verdimrc mentioned this pull request Oct 6, 2021
7 tasks
@ahsan-z-khan
Copy link
Member

Is this one needs to merged before #2564?

@verdimrc
Copy link
Contributor

verdimrc commented Oct 21, 2021

hi, if this one is accepted, then #2564 is not needed.

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 053a54e
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 90ca64a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 90ca64a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: a5c6faf
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: a5c6faf
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@@ -1265,9 +1266,8 @@ class FeatureStoreOutput(ApiObject):
class FrameworkProcessor(ScriptProcessor):
"""Handles Amazon SageMaker processing tasks for jobs using a machine learning framework."""

framework_entrypoint_command = ["/bin/bash"]
framework_entrypoint_command = ["/bin/sh"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a breaking change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will somewhat restrict the available syntax for the framework script, but that's private logic (_generate_framework_script). It won't change anything about the execution environment which is still python per line 1346 (should this actually be changed to python3 maybe?). The main effect it will have on users is compatibility with different container images by standardising on POSIX-compliant sh rather than the more advanced bash shell which some OSs don't offer out-of-the-box (e.g. Alpine Linux or others mentioned here). While I'm aware of multiple distros that offer some sh implementation but not bash in particular, I've not found any that go the other way around.

From my investigations, Ubuntu (16+) is used as a base by all of the DLCs, the Scikit-Learn framework container, the XGBoost framework container, and even the Chainer container. As per here, Ubuntu has supported both /bin/sh and /bin/bash since at least v6.

The Spark container (which isn't actually FrameworkProcessor-based) seems to use Amazon Linux 2, which also offers /bin/sh.

Overall from what I can tell this is not breaking and from a compatibility standpoint should be a step forward rather than back?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this makes sense

dependencies,
git_config,
job_name,
source_dir=source_dir,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these equals = necessary for the four arguments?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't be necessary in the sense that the order here matches the function declaration so it'd still work.

I did it for these because:

  • They're defined as keyword args and the order didn't seem obvious as a user
  • The fact that FrameworkProcessor inserts source_dir into the arg order defined by the parent, rather than only appending arguments in the child class, meant that being a bit more explicit could maybe help avoid confusion.

They can go back to positional if needed I'd say

jeniyat
jeniyat previously approved these changes Nov 2, 2021
Copy link
Contributor

@jeniyat jeniyat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 7b7322b
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@athewsey
Copy link
Collaborator Author

athewsey commented Nov 3, 2021

-slow-tests timed out. -pr looks like an unrelated failure that predictor.delete_model() in test_kmeans should probably be in a try/except in case of timeout?

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: a7b7783
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@codecov-commenter
Copy link

codecov-commenter commented Nov 24, 2021

Codecov Report

Merging #2664 (123a8ca) into dev (21ac6fd) will increase coverage by 0.02%.
The diff coverage is 95.91%.

❗ Current head 123a8ca differs from pull request most recent head 6bc0d80. Consider uploading reports for the commit 6bc0d80 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##              dev    #2664      +/-   ##
==========================================
+ Coverage   89.47%   89.50%   +0.02%     
==========================================
  Files         196      195       -1     
  Lines       16584    16581       -3     
==========================================
+ Hits        14839    14840       +1     
+ Misses       1745     1741       -4     
Impacted Files Coverage Δ
src/sagemaker/fw_utils.py 90.00% <ø> (ø)
src/sagemaker/huggingface/processing.py 100.00% <ø> (ø)
src/sagemaker/mxnet/processing.py 100.00% <ø> (ø)
src/sagemaker/pytorch/processing.py 100.00% <ø> (ø)
src/sagemaker/sklearn/processing.py 100.00% <ø> (ø)
src/sagemaker/tensorflow/processing.py 100.00% <ø> (ø)
src/sagemaker/xgboost/processing.py 100.00% <ø> (ø)
src/sagemaker/processing.py 96.47% <95.55%> (+0.39%) ⬆️
src/sagemaker/workflow/steps.py 97.36% <100.00%> (+0.01%) ⬆️
src/sagemaker/clarify.py 92.59% <0.00%> (-0.42%) ⬇️
... and 10 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 21ac6fd...6bc0d80. Read the comment docs.

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: f6c3312
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@athewsey
Copy link
Collaborator Author

^ slow-tests timed out again

@jeniyat jeniyat changed the base branch from master to dev January 5, 2022 22:42
Copy link
Member

@ahsan-z-khan ahsan-z-khan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets work on getting this merged in the upcoming release. @athewsey can you fix the conflicts?

@evakravi is there any relation to Jumpstart with this PR?

@athewsey
Copy link
Collaborator Author

The only notable conflict in this merge was that somebody had tried to introduce kms_key argument explicitly through ProcessingStep, whereas this PR was already proposing to allow **kwargs to pass through open-ended arguments to the Processor (which would have included support for kms_key).

Therefore to resolve the conflict I removed the new explicit kms_key argument to prevent inconsistency and bloat in the interface. This means users would still be able to pass kms_key in, but no longer be able to access the property step.kms_key (should have step.processing_kwargs["kms_key"] if they really needed it). If this is too much change (reviewers?) then I guess we'll have to go back to having both an explicit kms_key argument and kwargs support for all the others 🤢

I saw a weird error locally in tests/integ/test_workflow_with_clarify.py (execution.describe() only found one step instead of 2?) but hoping it's just an environment issue as don't think this merge should have affected it... Will see what CI finds.

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 7448580
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 7448580
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

navinsoni and others added 6 commits February 16, 2022 19:53
Co-authored-by: Mufaddal Rohawala <[email protected]>
Co-authored-by: Ahsan Khan <[email protected]>
Co-authored-by: Mufaddal Rohawala <[email protected]>
Co-authored-by: Basil Beirouti <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Shreya Pandit <[email protected]>
Co-authored-by: Basil Beirouti <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Mohamed Ali Jamaoui <[email protected]>
Co-authored-by: ci <ci>
Co-authored-by: Jeniya Tabassum <[email protected]>
Co-authored-by: sreedes <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Miyoung <[email protected]>
Co-authored-by: Ameen Khan <[email protected]>
Co-authored-by: Zhankui Lu <[email protected]>
Co-authored-by: Xiaoguang Chen <[email protected]>
Co-authored-by: Jonathan Guinegagne <[email protected]>
Co-authored-by: Zhankui Lu <[email protected]>
Co-authored-by: Yifei Zhu <[email protected]>
Co-authored-by: Qingzi-Lan <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: marckarp <[email protected]>
Co-authored-by: chenxy <[email protected]>
Co-authored-by: Xinghan Chen <[email protected]>
Co-authored-by: Tulio Casagrande <[email protected]>
Co-authored-by: jerrypeng7773 <[email protected]>
Co-authored-by: marckarp <[email protected]>
Co-authored-by: jayatalr <[email protected]>
Co-authored-by: bhaoz <[email protected]>
Co-authored-by: Ethan Cheng <[email protected]>
Co-authored-by: Xiaoguang Chen <[email protected]>
Co-authored-by: keerthanvasist <[email protected]>
Co-authored-by: Shreya Pandit <[email protected]>
Co-authored-by: Ivy Bazan <[email protected]>
Co-authored-by: IvyBazan <[email protected]>
Co-authored-by: Benjamin Crabtree <[email protected]>
Co-authored-by: iasoon <[email protected]>
Co-authored-by: evakravi <[email protected]>
@@ -74,6 +74,8 @@
"2.6",
"2.6.0",
"2.6.2",
"2.7",
"2.7.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this changed since and we need to update it?

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 6bc0d80
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@@ -2629,6 +2661,39 @@
"us-west-2": "763104351884"
},
"repository": "tensorflow-training"
},
"2.7.0": {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DLC is yet to release these containers. Can we remove this and introduce this later when DLC officially makes them available?

"registries": {
"af-south-1": "626614931356",
"ap-east-1": "871362719292",
"ap-northeast-1": "763104351884",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DLC is yet to release these containers. Can we remove this and introduce this later when DLC officially makes them available?

@@ -279,7 +279,8 @@
"2.3": "2.3.2",
"2.4": "2.4.3",
"2.5": "2.5.1",
"2.6": "2.6.0"
"2.6": "2.6.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as below, lets add this once TF 2.7 is out

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 6bc0d80
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 6bc0d80
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@mufaddal-rohawala
Copy link
Member

@athewsey can you help resolve the conflicts in this pr, by rebasing master on this.

@ChaudharyKaran
Copy link

Hi, Do we know when is this planned to be rolled out / merged.

@vikas0203 vikas0203 requested review from a team, navinsoni and knikure and removed request for a team November 23, 2022 21:39
@navinsoni
Copy link
Contributor

Closing as it's no longer reqd.

@navinsoni navinsoni closed this Dec 19, 2022
@ChaudharyKaran
Copy link

why is this is not required any more, can we use script processor with code repo's now?

@gcaria
Copy link

gcaria commented Jun 6, 2023

There is a chain of merge requests which point to this MR as the one that should allow for passing additional modules/code to ScriptProcessor. Since this MR is now closed, has the issue been addressed somewhere else?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.