Skip to content

feature: Adds support for Serverless inference #2831

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 14, 2022
Merged

feature: Adds support for Serverless inference #2831

merged 1 commit into from
Jan 14, 2022

Conversation

bhaoz
Copy link
Contributor

@bhaoz bhaoz commented Jan 6, 2022

Issue #, if available:

Description of changes:
Add support for Liberty feature aka serverless inference:

  1. Add sagemaker.serverless.serverless_inference_config as the configuration class for serverless inference
  2. Add support in sagemaker.estimator, sagemaker.model, sagemaker.tensorflow.model to let users deploy serverless endpoint by passing the configuration object.
    3.Add support in sagemaker.session to generate production variants with serverless config and also create endpoints using serverless configurations.

Detailed Design is in this doc

Testing done:

  1. Test the walk through behavior for deploying serverless endpoints and invoking serverless endpoints in integration test:test/integ/test_serverless_inference
  2. Add unit tests to ensure backward compatible and ensure the behaviors of newly added methods.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the CONTRIBUTING doc
  • I certify that the changes I am introducing will be backword compatible, and I have discussed concerns about this, if any, with the Python SDK team
  • I used the commit message format described in CONTRIBUTING
  • I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
  • I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
  • I have checked that my tests are not configured for a specific region or account (if appropriate)
  • I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov-commenter
Copy link

codecov-commenter commented Jan 6, 2022

Codecov Report

Merging #2831 (c981604) into dev (127c964) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev    #2831      +/-   ##
==========================================
+ Coverage   88.85%   88.87%   +0.01%     
==========================================
  Files         175      176       +1     
  Lines       15517    15540      +23     
==========================================
+ Hits        13788    13811      +23     
  Misses       1729     1729              
Impacted Files Coverage Δ
src/sagemaker/tensorflow/model.py 89.13% <ø> (ø)
src/sagemaker/estimator.py 90.61% <100.00%> (+0.01%) ⬆️
src/sagemaker/model.py 89.05% <100.00%> (+0.27%) ⬆️
...agemaker/serverless/serverless_inference_config.py 100.00% <100.00%> (ø)
src/sagemaker/session.py 70.72% <100.00%> (+0.11%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 127c964...c981604. Read the comment docs.

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: ad3dcc0
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: ad3dcc0
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: ad3dcc0
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@bhaoz
Copy link
Contributor Author

bhaoz commented Jan 6, 2022

Can I know why the slow-tests are failed? I cannot get much details from the log it posted. Thanks!


Raises:
ValueError: If serverless inference config is not specified and instance type
and instance count are also not specified
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cause the way we’re doing this, is to let customers pass an empty config object to let us know it’s a serverless endpoint to be deployed. We only default values if they pass an empty object. If they just leave that arg as None, we’ll not using serverless endpoint but normal endpoint. So that means, we need to check to ensure instance_type/instance_count and serverless_inference_config cannot be None in the same time.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we will throw this error when both serverless config and instance count are also none?

Comment on lines 805 to 806
if self._is_compiled_model and not is_serverless:
compiled_model_suffix = "-".join(instance_type.split(".")[:-1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is already used above, can one of these this be made redundant?


e.fit()

bad_args = ({"instance_type": INSTANCE_TYPE}, {"initial_instance_count": INSTANCE_COUNT})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also add to this list, scenerio when instance_type=None and initial_instance_count=None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure thing

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: ad3dcc0
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 1bd2a75
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 1bd2a75
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: 1bd2a75
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 1bd2a75
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: 1bd2a75
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@@ -209,7 +209,7 @@ def register(
model_package_arn=model_package.get("ModelPackageArn"),
)

def _init_sagemaker_session_if_does_not_exist(self, instance_type):
def _init_sagemaker_session_if_does_not_exist(self, instance_type=None):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since serverless will need instance_type to be None, so add this to support serverless case.


if instance_type.startswith("ml.inf") and not self._is_compiled_model:
if instance_type and instance_type.startswith("ml.inf") and not self._is_compiled_model:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

self.name,
instance_type,
initial_instance_count,
accelerator_type=accelerator_type,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not supporting accelerators for serverless, will the below code impact in anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be validated on the low-level botocore/boto3 libs, so it should be fine

memory_size_in_mb (int): Optional. The memory size of your serverless endpoint.
Valid values are in 1 GB increments: 1024 MB, 2048 MB, 3072 MB, 4096 MB,
5120 MB, or 6144 MB. If no value is provided, Amazon SageMaker will choose
the default value for you. (Default: 2048)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default is 1 GB, please check with Michael once

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BOTO3 API there is no default for memory and maxconcurrency shown in the Boto3 API. I can double check with Michael. The reason why we set default here in Sagemaker SDK is to simpler customer use case. They don't need to specify this config and we'll have a proper value for them. And on the other hand, if they want to set those, they can choose the config they prefer.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, the default is controlled by the SageMaker SDK here - there's no inherent default in the CreateEndpointConfig API itself.

Valid values are in 1 GB increments: 1024 MB, 2048 MB, 3072 MB, 4096 MB,
5120 MB, or 6144 MB. If no value is provided, Amazon SageMaker will choose
the default value for you. (Default: 2048)
max_concurrency (int): Optional. The maximum number of concurrent invocations
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a default maxconcurrency with Boto3 today? good to double check with Michael

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not with boto3.

Specifies configuration related to serverless endpoint. Use this configuration
when trying to create serverless endpoint and make serverless inference. If
empty config object passed through, we will use default config to deploy
serverless endpoint (default: None)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The language here is unclear " we will use default config to deploy serverless endpoint (default: None)". If we will default config, default should not be none right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed this one and the one in model.py to a clearer expression


Raises:
ValueError: If serverless inference config is not specified and instance type
and instance count are also not specified

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we will throw this error when both serverless config and instance count are also none?

Specifies configuration related to serverless endpoint. Use this configuration
when trying to create serverless endpoint and make serverless inference. If
empty config object passed through, we will use default config to deploy
serverless endpoint (default: None)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before, the language is not clear here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as before, the language is not clear here

serverless endpoint (default: None)
Raises:
ValueError: If no role is specified or if serverless inference config is not
specified and instance type and instance count are also not specified

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the error message that customers will see?

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: c981604
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: c981604
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: c981604
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@shreyapandit shreyapandit changed the title feature: add support for serverless inference feature: Adds support for Serverless inference Jan 14, 2022
@shreyapandit shreyapandit merged commit 9d259b3 into aws:dev Jan 14, 2022
shreyapandit added a commit that referenced this pull request Feb 3, 2022
* feature: allow conditional parellel builds (#2727)

* fix endpoint bug (#2772)

Co-authored-by: Basil Beirouti <[email protected]>

* fix: local mode - support relative file structure (#2768)

* prepare release v2.72.0

* update development version to v2.72.1.dev0

* fix: Set ProcessingStep upload locations deterministically to avoid c… (#2790)

* fix: Prevent repack_model script from referencing nonexistent directories (#2755)

Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Ahsan Khan <[email protected]>

* fix: S3Input - add support for instance attributes (#2754)

* fix: typos and broken link (#2765)

Co-authored-by: Shreya Pandit <[email protected]>

* prepare release v2.72.1

* update development version to v2.72.2.dev0

* fix: Model Registration with BYO scripts (#2797)

Co-authored-by: Basil Beirouti <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Ahsan Khan <[email protected]>
Co-authored-by: Mufaddal Rohawala <[email protected]>
Co-authored-by: Basil Beirouti <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Shreya Pandit <[email protected]>

* fix: Add ContentType in test_auto_ml_describe

* fix: Re-deploy static integ test endpoint if it is not found

* documentation :SageMaker model parallel library 1.6.0 API doc (#2814)

* update smdmp change log, archive api doc for 1.4.0 and 1.5.0

* add no-index flags

* finish api doc archive

* fix: Set ProcessingStep upload locations deterministically to avoid c… (#2790)

* fix: Prevent repack_model script from referencing nonexistent directories (#2755)

Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Ahsan Khan <[email protected]>

* fix: S3Input - add support for instance attributes (#2754)

* fix: typos and broken link (#2765)

Co-authored-by: Shreya Pandit <[email protected]>

* add all api docs

* add appendix, fix links

* structural changes, fix links

* incorporate feedback

* prepare release v2.72.1

* update development version to v2.72.2.dev0

Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Ahsan Khan <[email protected]>
Co-authored-by: Mufaddal Rohawala <[email protected]>
Co-authored-by: Mohamed Ali Jamaoui <[email protected]>
Co-authored-by: Shreya Pandit <[email protected]>
Co-authored-by: ci <ci>
Co-authored-by: Jeniya Tabassum <[email protected]>

* fix: fix kmeans test deletion sequence, increment lineage statics (#2815)

* fix: Increment static lineage pipeline (#2817)

* fix: Update CHANGELOG.md (#2832)

* prepare release v2.72.2

* update development version to v2.72.3.dev0

* change: update master from dev (#2836)

Co-authored-by: Basil Beirouti <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Ahsan Khan <[email protected]>
Co-authored-by: Mufaddal Rohawala <[email protected]>
Co-authored-by: Basil Beirouti <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Shreya Pandit <[email protected]>
Co-authored-by: Mohamed Ali Jamaoui <[email protected]>
Co-authored-by: ci <ci>
Co-authored-by: Jeniya Tabassum <[email protected]>
Co-authored-by: sreedes <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Miyoung <[email protected]>
Co-authored-by: Ameen Khan <[email protected]>
Co-authored-by: Zhankui Lu <[email protected]>
Co-authored-by: Xiaoguang Chen <[email protected]>
Co-authored-by: Jonathan Guinegagne <[email protected]>
Co-authored-by: Zhankui Lu <[email protected]>
Co-authored-by: Yifei Zhu <[email protected]>
Co-authored-by: Qingzi-Lan <[email protected]>

* prepare release v2.72.3

* update development version to v2.72.4.dev0

* fix: fixes unnecessary session call while generating pipeline definition for lambda step (#2824)

* feature: Add models_v2 under lineage context (#2800)

* feature: enable python 3.9 (#2802)

Co-authored-by: Ahsan Khan <[email protected]>

* change: Update CHANGELOG.md (#2842)

* fix: update pricing link (#2805)

Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Ahsan Khan <[email protected]>
Co-authored-by: Shreya Pandit <[email protected]>
Co-authored-by: Basil Beirouti <[email protected]>
Co-authored-by: Mufaddal Rohawala <[email protected]>
Co-authored-by: Basil Beirouti <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Mohamed Ali Jamaoui <[email protected]>
Co-authored-by: ci <ci>
Co-authored-by: Jeniya Tabassum <[email protected]>
Co-authored-by: sreedes <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Miyoung <[email protected]>
Co-authored-by: Ameen Khan <[email protected]>
Co-authored-by: Zhankui Lu <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Xiaoguang Chen <[email protected]>
Co-authored-by: Jonathan Guinegagne <[email protected]>
Co-authored-by: Zhankui Lu <[email protected]>
Co-authored-by: Yifei Zhu <[email protected]>
Co-authored-by: Qingzi-Lan <[email protected]>

* doc: Document the available ExecutionVariables (#2807)

* fix: Remove duplicate vertex/edge in query lineage (#2784)

* feature: Support model pipelines in CreateModelStep (#2845)

Co-authored-by: Payton Staub <[email protected]>

* feature: support JsonGet/Join parameterization in tuning step Hyperparameters (#2833)

* doc: Enhance smddp 1.2.2 doc (#2852)

* feature: support checkpoint to be passed from estimator (#2849)

Co-authored-by: marckarp <[email protected]>

* fix: allow kms_key to be passed for processing step (#2779)

* feature: Adds support for Serverless inference (#2831)

* feature: Add support for SageMaker lineage queries in action (#2853)

* feature: Adds Lineage queries in artifact, context and trial components (#2838)

* feature: Add EMRStep support in Sagemaker pipeline (#2848)

Co-authored-by: chenxy <[email protected]>

* prepare release v2.73.0

* update development version to v2.73.1.dev0

* feature: Add support for SageMaker lineage queries context (#2830)

* fix: support specifying a facet by its column index

Currently the Clarify BiasConfig only accepts facet name. Actually
Clarify analysis configuration supports both name and index. This
commit adds the same support to BiasConfig.

* doc: more documentation for serverless inference (#2859)

* prepare release v2.74.0

* update development version to v2.74.1.dev0

* Add deprecation warning in Clarify DataConfig (#2847)

* feature: Update instance types for integ test (#2881)

* feature: Adds support for async inference (#2846)

* fix: update to incorporate black v22, pin tox versions (#2889)

Co-authored-by: Mufaddal Rohawala <[email protected]>

* make black happy

Co-authored-by: Mufaddal Rohawala <[email protected]>
Co-authored-by: Basil Beirouti <[email protected]>
Co-authored-by: Basil Beirouti <[email protected]>
Co-authored-by: ci <ci>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Ahsan Khan <[email protected]>
Co-authored-by: Mohamed Ali Jamaoui <[email protected]>
Co-authored-by: Shreya Pandit <[email protected]>
Co-authored-by: sreedes <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Miyoung <[email protected]>
Co-authored-by: Jeniya Tabassum <[email protected]>
Co-authored-by: Ameen Khan <[email protected]>
Co-authored-by: Zhankui Lu <[email protected]>
Co-authored-by: Xiaoguang Chen <[email protected]>
Co-authored-by: Jonathan Guinegagne <[email protected]>
Co-authored-by: Zhankui Lu <[email protected]>
Co-authored-by: Yifei Zhu <[email protected]>
Co-authored-by: Qingzi-Lan <[email protected]>
Co-authored-by: Xinghan Chen <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Tulio Casagrande <[email protected]>
Co-authored-by: jerrypeng7773 <[email protected]>
Co-authored-by: marckarp <[email protected]>
Co-authored-by: marckarp <[email protected]>
Co-authored-by: jayatalr <[email protected]>
Co-authored-by: bhaoz <[email protected]>
Co-authored-by: Ethan Cheng <[email protected]>
Co-authored-by: chenxy <[email protected]>
Co-authored-by: Xiaoguang Chen <[email protected]>
Co-authored-by: keerthanvasist <[email protected]>
Co-authored-by: Mufaddal Rohawala <[email protected]>
Co-authored-by: Shreya Pandit <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants