-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feature: Adds support for Serverless inference #2831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #2831 +/- ##
==========================================
+ Coverage 88.85% 88.87% +0.01%
==========================================
Files 175 176 +1
Lines 15517 15540 +23
==========================================
+ Hits 13788 13811 +23
Misses 1729 1729
Continue to review full report at Codecov.
|
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Can I know why the slow-tests are failed? I cannot get much details from the log it posted. Thanks! |
src/sagemaker/estimator.py
Outdated
|
||
Raises: | ||
ValueError: If serverless inference config is not specified and instance type | ||
and instance count are also not specified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cause the way we’re doing this, is to let customers pass an empty config object to let us know it’s a serverless endpoint to be deployed. We only default values if they pass an empty object. If they just leave that arg as None, we’ll not using serverless endpoint but normal endpoint. So that means, we need to check to ensure instance_type/instance_count and serverless_inference_config cannot be None in the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we will throw this error when both serverless config and instance count are also none?
src/sagemaker/model.py
Outdated
if self._is_compiled_model and not is_serverless: | ||
compiled_model_suffix = "-".join(instance_type.split(".")[:-1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is already used above, can one of these this be made redundant?
tests/unit/test_estimator.py
Outdated
|
||
e.fit() | ||
|
||
bad_args = ({"instance_type": INSTANCE_TYPE}, {"initial_instance_count": INSTANCE_COUNT}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we also add to this list, scenerio when instance_type=None and initial_instance_count=None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure thing
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
@@ -209,7 +209,7 @@ def register( | |||
model_package_arn=model_package.get("ModelPackageArn"), | |||
) | |||
|
|||
def _init_sagemaker_session_if_does_not_exist(self, instance_type): | |||
def _init_sagemaker_session_if_does_not_exist(self, instance_type=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since serverless will need instance_type
to be None, so add this to support serverless case.
|
||
if instance_type.startswith("ml.inf") and not self._is_compiled_model: | ||
if instance_type and instance_type.startswith("ml.inf") and not self._is_compiled_model: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
self.name, | ||
instance_type, | ||
initial_instance_count, | ||
accelerator_type=accelerator_type, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not supporting accelerators for serverless, will the below code impact in anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be validated on the low-level botocore/boto3 libs, so it should be fine
memory_size_in_mb (int): Optional. The memory size of your serverless endpoint. | ||
Valid values are in 1 GB increments: 1024 MB, 2048 MB, 3072 MB, 4096 MB, | ||
5120 MB, or 6144 MB. If no value is provided, Amazon SageMaker will choose | ||
the default value for you. (Default: 2048) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default is 1 GB, please check with Michael once
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BOTO3 API there is no default for memory and maxconcurrency shown in the Boto3 API. I can double check with Michael. The reason why we set default here in Sagemaker SDK is to simpler customer use case. They don't need to specify this config and we'll have a proper value for them. And on the other hand, if they want to set those, they can choose the config they prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the default is controlled by the SageMaker SDK here - there's no inherent default in the CreateEndpointConfig API itself.
Valid values are in 1 GB increments: 1024 MB, 2048 MB, 3072 MB, 4096 MB, | ||
5120 MB, or 6144 MB. If no value is provided, Amazon SageMaker will choose | ||
the default value for you. (Default: 2048) | ||
max_concurrency (int): Optional. The maximum number of concurrent invocations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a default maxconcurrency with Boto3 today? good to double check with Michael
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not with boto3.
src/sagemaker/estimator.py
Outdated
Specifies configuration related to serverless endpoint. Use this configuration | ||
when trying to create serverless endpoint and make serverless inference. If | ||
empty config object passed through, we will use default config to deploy | ||
serverless endpoint (default: None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The language here is unclear " we will use default config to deploy serverless endpoint (default: None)". If we will default config, default should not be none right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed this one and the one in model.py
to a clearer expression
src/sagemaker/estimator.py
Outdated
|
||
Raises: | ||
ValueError: If serverless inference config is not specified and instance type | ||
and instance count are also not specified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we will throw this error when both serverless config and instance count are also none?
src/sagemaker/model.py
Outdated
Specifies configuration related to serverless endpoint. Use this configuration | ||
when trying to create serverless endpoint and make serverless inference. If | ||
empty config object passed through, we will use default config to deploy | ||
serverless endpoint (default: None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as before, the language is not clear here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as before, the language is not clear here
src/sagemaker/model.py
Outdated
serverless endpoint (default: None) | ||
Raises: | ||
ValueError: If no role is specified or if serverless inference config is not | ||
specified and instance type and instance count are also not specified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the error message that customers will see?
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
* feature: allow conditional parellel builds (#2727) * fix endpoint bug (#2772) Co-authored-by: Basil Beirouti <[email protected]> * fix: local mode - support relative file structure (#2768) * prepare release v2.72.0 * update development version to v2.72.1.dev0 * fix: Set ProcessingStep upload locations deterministically to avoid c… (#2790) * fix: Prevent repack_model script from referencing nonexistent directories (#2755) Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Ahsan Khan <[email protected]> * fix: S3Input - add support for instance attributes (#2754) * fix: typos and broken link (#2765) Co-authored-by: Shreya Pandit <[email protected]> * prepare release v2.72.1 * update development version to v2.72.2.dev0 * fix: Model Registration with BYO scripts (#2797) Co-authored-by: Basil Beirouti <[email protected]> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Ahsan Khan <[email protected]> Co-authored-by: Mufaddal Rohawala <[email protected]> Co-authored-by: Basil Beirouti <[email protected]> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Shreya Pandit <[email protected]> * fix: Add ContentType in test_auto_ml_describe * fix: Re-deploy static integ test endpoint if it is not found * documentation :SageMaker model parallel library 1.6.0 API doc (#2814) * update smdmp change log, archive api doc for 1.4.0 and 1.5.0 * add no-index flags * finish api doc archive * fix: Set ProcessingStep upload locations deterministically to avoid c… (#2790) * fix: Prevent repack_model script from referencing nonexistent directories (#2755) Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Ahsan Khan <[email protected]> * fix: S3Input - add support for instance attributes (#2754) * fix: typos and broken link (#2765) Co-authored-by: Shreya Pandit <[email protected]> * add all api docs * add appendix, fix links * structural changes, fix links * incorporate feedback * prepare release v2.72.1 * update development version to v2.72.2.dev0 Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Ahsan Khan <[email protected]> Co-authored-by: Mufaddal Rohawala <[email protected]> Co-authored-by: Mohamed Ali Jamaoui <[email protected]> Co-authored-by: Shreya Pandit <[email protected]> Co-authored-by: ci <ci> Co-authored-by: Jeniya Tabassum <[email protected]> * fix: fix kmeans test deletion sequence, increment lineage statics (#2815) * fix: Increment static lineage pipeline (#2817) * fix: Update CHANGELOG.md (#2832) * prepare release v2.72.2 * update development version to v2.72.3.dev0 * change: update master from dev (#2836) Co-authored-by: Basil Beirouti <[email protected]> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Ahsan Khan <[email protected]> Co-authored-by: Mufaddal Rohawala <[email protected]> Co-authored-by: Basil Beirouti <[email protected]> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Shreya Pandit <[email protected]> Co-authored-by: Mohamed Ali Jamaoui <[email protected]> Co-authored-by: ci <ci> Co-authored-by: Jeniya Tabassum <[email protected]> Co-authored-by: sreedes <[email protected]> Co-authored-by: Navin Soni <[email protected]> Co-authored-by: Miyoung <[email protected]> Co-authored-by: Ameen Khan <[email protected]> Co-authored-by: Zhankui Lu <[email protected]> Co-authored-by: Xiaoguang Chen <[email protected]> Co-authored-by: Jonathan Guinegagne <[email protected]> Co-authored-by: Zhankui Lu <[email protected]> Co-authored-by: Yifei Zhu <[email protected]> Co-authored-by: Qingzi-Lan <[email protected]> * prepare release v2.72.3 * update development version to v2.72.4.dev0 * fix: fixes unnecessary session call while generating pipeline definition for lambda step (#2824) * feature: Add models_v2 under lineage context (#2800) * feature: enable python 3.9 (#2802) Co-authored-by: Ahsan Khan <[email protected]> * change: Update CHANGELOG.md (#2842) * fix: update pricing link (#2805) Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Ahsan Khan <[email protected]> Co-authored-by: Shreya Pandit <[email protected]> Co-authored-by: Basil Beirouti <[email protected]> Co-authored-by: Mufaddal Rohawala <[email protected]> Co-authored-by: Basil Beirouti <[email protected]> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Mohamed Ali Jamaoui <[email protected]> Co-authored-by: ci <ci> Co-authored-by: Jeniya Tabassum <[email protected]> Co-authored-by: sreedes <[email protected]> Co-authored-by: Navin Soni <[email protected]> Co-authored-by: Miyoung <[email protected]> Co-authored-by: Ameen Khan <[email protected]> Co-authored-by: Zhankui Lu <[email protected]> Co-authored-by: Navin Soni <[email protected]> Co-authored-by: Xiaoguang Chen <[email protected]> Co-authored-by: Jonathan Guinegagne <[email protected]> Co-authored-by: Zhankui Lu <[email protected]> Co-authored-by: Yifei Zhu <[email protected]> Co-authored-by: Qingzi-Lan <[email protected]> * doc: Document the available ExecutionVariables (#2807) * fix: Remove duplicate vertex/edge in query lineage (#2784) * feature: Support model pipelines in CreateModelStep (#2845) Co-authored-by: Payton Staub <[email protected]> * feature: support JsonGet/Join parameterization in tuning step Hyperparameters (#2833) * doc: Enhance smddp 1.2.2 doc (#2852) * feature: support checkpoint to be passed from estimator (#2849) Co-authored-by: marckarp <[email protected]> * fix: allow kms_key to be passed for processing step (#2779) * feature: Adds support for Serverless inference (#2831) * feature: Add support for SageMaker lineage queries in action (#2853) * feature: Adds Lineage queries in artifact, context and trial components (#2838) * feature: Add EMRStep support in Sagemaker pipeline (#2848) Co-authored-by: chenxy <[email protected]> * prepare release v2.73.0 * update development version to v2.73.1.dev0 * feature: Add support for SageMaker lineage queries context (#2830) * fix: support specifying a facet by its column index Currently the Clarify BiasConfig only accepts facet name. Actually Clarify analysis configuration supports both name and index. This commit adds the same support to BiasConfig. * doc: more documentation for serverless inference (#2859) * prepare release v2.74.0 * update development version to v2.74.1.dev0 * Add deprecation warning in Clarify DataConfig (#2847) * feature: Update instance types for integ test (#2881) * feature: Adds support for async inference (#2846) * fix: update to incorporate black v22, pin tox versions (#2889) Co-authored-by: Mufaddal Rohawala <[email protected]> * make black happy Co-authored-by: Mufaddal Rohawala <[email protected]> Co-authored-by: Basil Beirouti <[email protected]> Co-authored-by: Basil Beirouti <[email protected]> Co-authored-by: ci <ci> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Ahsan Khan <[email protected]> Co-authored-by: Mohamed Ali Jamaoui <[email protected]> Co-authored-by: Shreya Pandit <[email protected]> Co-authored-by: sreedes <[email protected]> Co-authored-by: Navin Soni <[email protected]> Co-authored-by: Miyoung <[email protected]> Co-authored-by: Jeniya Tabassum <[email protected]> Co-authored-by: Ameen Khan <[email protected]> Co-authored-by: Zhankui Lu <[email protected]> Co-authored-by: Xiaoguang Chen <[email protected]> Co-authored-by: Jonathan Guinegagne <[email protected]> Co-authored-by: Zhankui Lu <[email protected]> Co-authored-by: Yifei Zhu <[email protected]> Co-authored-by: Qingzi-Lan <[email protected]> Co-authored-by: Xinghan Chen <[email protected]> Co-authored-by: Navin Soni <[email protected]> Co-authored-by: Tulio Casagrande <[email protected]> Co-authored-by: jerrypeng7773 <[email protected]> Co-authored-by: marckarp <[email protected]> Co-authored-by: marckarp <[email protected]> Co-authored-by: jayatalr <[email protected]> Co-authored-by: bhaoz <[email protected]> Co-authored-by: Ethan Cheng <[email protected]> Co-authored-by: chenxy <[email protected]> Co-authored-by: Xiaoguang Chen <[email protected]> Co-authored-by: keerthanvasist <[email protected]> Co-authored-by: Mufaddal Rohawala <[email protected]> Co-authored-by: Shreya Pandit <[email protected]>
Issue #, if available:
Description of changes:
Add support for Liberty feature aka serverless inference:
sagemaker.serverless.serverless_inference_config
as the configuration class for serverless inferencesagemaker.estimator
,sagemaker.model
,sagemaker.tensorflow.model
to let users deploy serverless endpoint by passing the configuration object.3.Add support in
sagemaker.session
to generate production variants with serverless config and also create endpoints using serverless configurations.Detailed Design is in this doc
Testing done:
test/integ/test_serverless_inference
Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.General
Tests
unique_name_from_base
to create resource names in integ tests (if appropriate)By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.