feature: processors that support multiple Python files, requirements.txt, and dependencies. #2251

verdimrc · 2021-03-29T06:54:46Z

Issue #, if available: #1248, #2117

Description of changes: Propose processing classes that are feature-parity with estimator. These classes allow SDK users to runn a Python job that consists of multiple Python scripts, requirements.txt and additional dependencies.

Documentation provided as docstrings.

Testing done: on my own AWS account, ran processing jobs using the proposed classes (FrameworkProcessor and its subclasses) -- the testing scripts are located here, and usage is as outlined as in #1248 (this comment).

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

I have read the CONTRIBUTING doc
I used the commit message format described in CONTRIBUTING
I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

I have added tests that prove my fix is effective or that my feature works (if appropriate)
I have checked that my tests are not configured for a specific region or account (if appropriate)
I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

This is to conform to the existing style adopted in the sagemaker python sdk.

sagemaker-bot · 2021-03-29T06:57:15Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-unit-tests
Commit ID: 28a3a44
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-03-29T07:30:18Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-notebook-tests
Commit ID: 28a3a44
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-03-29T07:33:38Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-slow-tests
Commit ID: 28a3a44
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-03-29T07:36:51Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-local-mode-tests
Commit ID: 28a3a44
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-03-29T07:50:53Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-pr
Commit ID: 28a3a44
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-03-30T10:21:10Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-unit-tests
Commit ID: 3a1907f
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-03-30T10:44:57Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-slow-tests
Commit ID: 3a1907f
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-03-30T10:52:42Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-local-mode-tests
Commit ID: 3a1907f
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-03-30T10:54:58Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-pr
Commit ID: 3a1907f
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-03-30T10:55:25Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-notebook-tests
Commit ID: 3a1907f
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

athewsey · 2021-04-15T04:28:16Z

Can a reviewer help us with this? It's functionality I'd find really useful!

athewsey · 2021-04-20T02:21:25Z

One limitation I've found with this approach (from exploring the smallmatter version) is that the SKLearn estimator class currently explicitly forbids running with instance_count>1

Is this only to protect users who might otherwise just duplicate their infrastructure without realising they need to actually implement parallelism for it to be effective? Or is there some actual limitation on the container/backend setup that it can't be run across multiple instances?

Maybe we could add some kind of override just for processing, or revisit the need for the check in training in the first place?

src/sagemaker/sklearn/processing.py

ajaykarpur

Thanks for contributing! Please add unit and integration tests for the new classes.

sagemaker-bot · 2021-04-21T20:14:20Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-unit-tests
Commit ID: 7a4d2f5
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-04-21T20:39:01Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-slow-tests
Commit ID: 7a4d2f5
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-04-21T20:44:42Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-notebook-tests
Commit ID: 7a4d2f5
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-04-21T20:46:31Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-local-mode-tests
Commit ID: 7a4d2f5
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-08-03T19:09:59Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-slow-tests
Commit ID: f6b7c5b
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-08-03T19:12:14Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-notebook-tests
Commit ID: f6b7c5b
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-08-03T19:21:18Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-local-mode-tests
Commit ID: f6b7c5b
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-08-03T21:38:37Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-pr
Commit ID: f6b7c5b
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-08-04T02:50:24Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-pr
Commit ID: f6b7c5b
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

DaniloTommasinaTR · 2021-08-04T08:15:11Z

Our team is really keen to see this feature go live. This addition will significantly reduce the adoption barrier for our researchers and scientists to start using processing jobs as a core part of their day-to-day job.
Thanks.

sagemaker-bot · 2021-08-04T18:33:26Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-unit-tests
Commit ID: 91243f9
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-08-04T19:01:18Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-slow-tests
Commit ID: 91243f9
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-08-04T19:07:26Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-local-mode-tests
Commit ID: 91243f9
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-08-04T19:18:46Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-notebook-tests
Commit ID: 91243f9
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-08-04T19:50:53Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-pr
Commit ID: 91243f9
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-08-04T22:03:20Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-pr
Commit ID: 91243f9
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-08-04T23:14:19Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-unit-tests
Commit ID: a397a84
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-08-04T23:38:41Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-pr
Commit ID: a397a84
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-08-04T23:40:10Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-notebook-tests
Commit ID: a397a84
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-08-04T23:41:37Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-slow-tests
Commit ID: a397a84
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-08-04T23:48:46Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-local-mode-tests
Commit ID: a397a84
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

verdimrc · 2021-08-05T01:39:45Z

many thanks for the merge!

* fix: revert #2251 changes for sklearn * fix docstyle * fix sphinx * fix tests * Revert local test for SklearnProcessor Co-authored-by: Verdi March <[email protected]>

thorrester · 2021-08-24T20:38:37Z

Is there a time-table for release of this update? I saw it was in #2251, but then reverted. This feature would be extremely helpful for specifying source directories for sagemaker preprocessing steps that require additional util scripts and req files.

ahsan-z-khan · 2021-08-25T00:04:16Z

Is there a time-table for release of this update? I saw it was in #2251, but then reverted. This feature would be extremely helpful for specifying source directories for sagemaker preprocessing steps that require additional util scripts and req files.

@thorrester There is a new PR regarding this change. We are reviewing that.

thorrester · 2021-08-26T10:36:40Z

@ahsan-z-khan That's great to hear. Thank you!

marianokamp · 2021-08-26T11:52:05Z

Is there a time-table for release of this update? I saw it was in #2251, but then reverted. This feature would be extremely helpful for specifying source directories for sagemaker preprocessing steps that require additional util scripts and req files.

@thorrester There is a new PR regarding this change. We are reviewing that.

Which one is it?

verdimrc · 2021-08-26T12:07:16Z

Hi, the new PR is #2564

Regards,
Verdi

verdimrc added 6 commits March 29, 2021 14:25

Framework processor: first port

087482c

Subclasses to go to their respective submodule

b8972cc

FrameworkProcessor: source_dir defaults to None

8d554fa

Remove type annotations from public APIs

b83b0d0

This is to conform to the existing style adopted in the sagemaker python sdk.

Fix circular dependency between processing.py and estimator.py

547e1e6

Disable type-checker on line sagemaker.estimator.Framework

28a3a44

verdimrc changed the title ~~SageMaker processor that supports multiple Python files and requirements.txt~~ SageMaker processors that support multiple Python files and requirements.txt Mar 29, 2021

verdimrc changed the title ~~SageMaker processors that support multiple Python files and requirements.txt~~ feature: processors that support multiple Python files, requirements.txt, and dependencies. Mar 29, 2021

verdimrc marked this pull request as ready for review March 29, 2021 06:56

Fix pylint errors & warnings

3a1907f

Merge branch 'master' into pr-framework-processor

7a4d2f5

ajaykarpur self-requested a review April 21, 2021 20:03

ajaykarpur reviewed Apr 21, 2021

View reviewed changes

src/sagemaker/sklearn/processing.py Outdated Show resolved Hide resolved

ajaykarpur suggested changes Apr 21, 2021

View reviewed changes

ahsan-z-khan approved these changes Aug 3, 2021

View reviewed changes

Merge branch 'master' into pr-framework-processor

91243f9

Merge branch 'master' into pr-framework-processor

a397a84

shreyapandit merged commit b3c8bb1 into aws:master Aug 5, 2021

ahsan-z-khan added a commit to ahsan-z-khan/sagemaker-python-sdk that referenced this pull request Aug 6, 2021

fix: revert aws#2251 changes for sklearn

a175112

athewsey mentioned this pull request Sep 14, 2021

Restore SKLearn FrameworkProcessor via _normalize_args #2633

Closed

7 tasks

feature: processors that support multiple Python files, requirements.txt, and dependencies. #2251

feature: processors that support multiple Python files, requirements.txt, and dependencies. #2251

Conversation

verdimrc commented Mar 29, 2021 • edited Loading

Merge Checklist

General

Tests

sagemaker-bot commented Mar 29, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Mar 29, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Mar 29, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Mar 29, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Mar 29, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Mar 30, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Mar 30, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Mar 30, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Mar 30, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Mar 30, 2021

AWS CodeBuild CI Report

athewsey commented Apr 15, 2021

athewsey commented Apr 20, 2021

ajaykarpur left a comment

Choose a reason for hiding this comment

sagemaker-bot commented Apr 21, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Apr 21, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Apr 21, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Apr 21, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Aug 3, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Aug 3, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Aug 3, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Aug 3, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Aug 4, 2021

AWS CodeBuild CI Report

DaniloTommasinaTR commented Aug 4, 2021

sagemaker-bot commented Aug 4, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Aug 4, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Aug 4, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Aug 4, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Aug 4, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Aug 4, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Aug 4, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Aug 4, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Aug 4, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Aug 4, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Aug 4, 2021

AWS CodeBuild CI Report

verdimrc commented Aug 5, 2021

thorrester commented Aug 24, 2021

ahsan-z-khan commented Aug 25, 2021

thorrester commented Aug 26, 2021

marianokamp commented Aug 26, 2021 • edited Loading

verdimrc commented Aug 26, 2021

verdimrc commented Mar 29, 2021 •

edited

Loading

marianokamp commented Aug 26, 2021 •

edited

Loading