-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feature: processors that support multiple Python files, requirements.txt, and dependencies. #2251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is to conform to the existing style adopted in the sagemaker python sdk.
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Can a reviewer help us with this? It's functionality I'd find really useful! |
One limitation I've found with this approach (from exploring the smallmatter version) is that the SKLearn estimator class currently explicitly forbids running with Is this only to protect users who might otherwise just duplicate their infrastructure without realising they need to actually implement parallelism for it to be effective? Or is there some actual limitation on the container/backend setup that it can't be run across multiple instances? Maybe we could add some kind of override just for processing, or revisit the need for the check in training in the first place? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for contributing! Please add unit and integration tests for the new classes.
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Our team is really keen to see this feature go live. This addition will significantly reduce the adoption barrier for our researchers and scientists to start using processing jobs as a core part of their day-to-day job. |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
many thanks for the merge! |
* fix: revert #2251 changes for sklearn * fix docstyle * fix sphinx * fix tests * Revert local test for SklearnProcessor Co-authored-by: Verdi March <[email protected]>
Is there a time-table for release of this update? I saw it was in #2251, but then reverted. This feature would be extremely helpful for specifying source directories for sagemaker preprocessing steps that require additional util scripts and req files. |
@thorrester There is a new PR regarding this change. We are reviewing that. |
@ahsan-z-khan That's great to hear. Thank you! |
Which one is it? |
Hi, the new PR is #2564 Regards, |
Issue #, if available: #1248, #2117
Description of changes: Propose processing classes that are feature-parity with estimator. These classes allow SDK users to runn a Python job that consists of multiple Python scripts,
requirements.txt
and additional dependencies.Documentation provided as docstrings.
Testing done: on my own AWS account, ran processing jobs using the proposed classes (
FrameworkProcessor
and its subclasses) -- the testing scripts are located here, and usage is as outlined as in #1248 (this comment).Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.General
Tests
unique_name_from_base
to create resource names in integ tests (if appropriate)By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.