-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feature: Adds support for async inference #2846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Codecov Report
@@ Coverage Diff @@
## dev #2846 +/- ##
==========================================
+ Coverage 88.97% 89.09% +0.11%
==========================================
Files 178 182 +4
Lines 15758 15945 +187
==========================================
+ Hits 14021 14206 +185
- Misses 1737 1739 +2
Continue to review full report at Codecov.
|
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor edits - just make sure you're referring to it as async inference and not just async.
doc/overview.rst
Outdated
********************************** | ||
Amazon SageMaker Asynchronous Inference is a new capability in SageMaker that queues incoming requests and processes them asynchronously. | ||
This option is ideal for requests with large payload sizes up to 1GB, long processing times, and near real-time latency requirements. | ||
Asynchronous Inference enables you to save on costs by autoscaling the instance count to zero when there are no requests to process, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets make it clear that users need to configure scale down to zero. Suggest rephrasing to "You can configure Asynchronous Inference scale the instance count to zero when there are no requests to process, thereby saving costs"
doc/overview.rst
Outdated
This option is ideal for requests with large payload sizes up to 1GB, long processing times, and near real-time latency requirements. | ||
Asynchronous Inference enables you to save on costs by autoscaling the instance count to zero when there are no requests to process, | ||
so you only pay when your endpoint is processing requests. More information about | ||
SageMaker Serverless Inference can be found in the `AWS documentation <https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference.html>`__. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Async, not serverless
doc/overview.rst
Outdated
so you only pay when your endpoint is processing requests. More information about | ||
SageMaker Serverless Inference can be found in the `AWS documentation <https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference.html>`__. | ||
|
||
To deploy asynchronous endpoint, you will need to create a ``AsyncInferenceConfig`` object. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asynchronous inference
doc/overview.rst
Outdated
} | ||
) | ||
|
||
Then use the ``AsyncInferenceConfig`` in the estimator's ``deploy()`` method to deploy an asynchronous endpoint: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
asynchronous inference
doc/overview.rst
Outdated
# Deploys the model that was generated by fit() to a SageMaker asynchronous endpoint | ||
async_predictor = estimator.deploy(async_inference_config=async_config) | ||
|
||
After deployment is complete, it will return an ``AsyncPredictor``. You can use it to perform asynchronous inference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
``AsyncPredictor` object
class AsyncInferenceResponse(object): | ||
"""Response from Async Inference endpoint | ||
|
||
This response object provides a method to check the async Amazon S3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
async Amazon .. sounds off. consider rephrasing to "for an async inference result in the Amazon S3 output path specified..."
"""Response from Async Inference endpoint | ||
|
||
This response object provides a method to check the async Amazon S3 | ||
output path. If result object exists in that path, decode and return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean by 'decode'
self, | ||
waiter_config=None, | ||
): | ||
"""Get result from the async Amazon S3 output path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as above - async inference result from the Amazon S3 output path
"""Initialize a WaiterConfig object that provides parameters to control waiting behavior. | ||
|
||
Args: | ||
max_attempts (int): The maximum number of attempts to be made. (Default: 60) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if the max attempts is exceeded and there is no result? do we raise an exception?
src/sagemaker/estimator.py
Outdated
@@ -915,11 +916,17 @@ def deploy( | |||
data_capture_config (sagemaker.model_monitor.DataCaptureConfig): Specifies | |||
configuration related to Endpoint data capture for use with | |||
Amazon SageMaker Model Monitoring. Default: None. | |||
async_inference_config (sagemaker.model_monitor.AsyncInferenceConfig): Specifies | |||
configuration related to async endpoint. Use this configuration when trying |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
async inference instead of async endpoint
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
|
||
To deploy asynchronous inference endpoint, you will need to create a ``AsyncInferenceConfig`` object. | ||
If you create ``AsyncInferenceConfig`` without specifying its arguments, the default ``S3OutputPath`` will | ||
be ``s3://sagemaker-{REGION}-{ACCOUNTID}/async-endpoint-outputs/{UNIQUE-JOB-NAME}``. (example shown below): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think below you have the output being async-endpoint-output
but here you say async-endpoint-outputs
. Im fine with either, but I think we should be consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sry for this. Modifying these two parts concurrently and left some typos. Fixed it now
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
* feature: allow conditional parellel builds (#2727) * fix endpoint bug (#2772) Co-authored-by: Basil Beirouti <[email protected]> * fix: local mode - support relative file structure (#2768) * prepare release v2.72.0 * update development version to v2.72.1.dev0 * fix: Set ProcessingStep upload locations deterministically to avoid c… (#2790) * fix: Prevent repack_model script from referencing nonexistent directories (#2755) Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Ahsan Khan <[email protected]> * fix: S3Input - add support for instance attributes (#2754) * fix: typos and broken link (#2765) Co-authored-by: Shreya Pandit <[email protected]> * prepare release v2.72.1 * update development version to v2.72.2.dev0 * fix: Model Registration with BYO scripts (#2797) Co-authored-by: Basil Beirouti <[email protected]> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Ahsan Khan <[email protected]> Co-authored-by: Mufaddal Rohawala <[email protected]> Co-authored-by: Basil Beirouti <[email protected]> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Shreya Pandit <[email protected]> * fix: Add ContentType in test_auto_ml_describe * fix: Re-deploy static integ test endpoint if it is not found * documentation :SageMaker model parallel library 1.6.0 API doc (#2814) * update smdmp change log, archive api doc for 1.4.0 and 1.5.0 * add no-index flags * finish api doc archive * fix: Set ProcessingStep upload locations deterministically to avoid c… (#2790) * fix: Prevent repack_model script from referencing nonexistent directories (#2755) Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Ahsan Khan <[email protected]> * fix: S3Input - add support for instance attributes (#2754) * fix: typos and broken link (#2765) Co-authored-by: Shreya Pandit <[email protected]> * add all api docs * add appendix, fix links * structural changes, fix links * incorporate feedback * prepare release v2.72.1 * update development version to v2.72.2.dev0 Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Ahsan Khan <[email protected]> Co-authored-by: Mufaddal Rohawala <[email protected]> Co-authored-by: Mohamed Ali Jamaoui <[email protected]> Co-authored-by: Shreya Pandit <[email protected]> Co-authored-by: ci <ci> Co-authored-by: Jeniya Tabassum <[email protected]> * fix: fix kmeans test deletion sequence, increment lineage statics (#2815) * fix: Increment static lineage pipeline (#2817) * fix: Update CHANGELOG.md (#2832) * prepare release v2.72.2 * update development version to v2.72.3.dev0 * change: update master from dev (#2836) Co-authored-by: Basil Beirouti <[email protected]> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Ahsan Khan <[email protected]> Co-authored-by: Mufaddal Rohawala <[email protected]> Co-authored-by: Basil Beirouti <[email protected]> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Shreya Pandit <[email protected]> Co-authored-by: Mohamed Ali Jamaoui <[email protected]> Co-authored-by: ci <ci> Co-authored-by: Jeniya Tabassum <[email protected]> Co-authored-by: sreedes <[email protected]> Co-authored-by: Navin Soni <[email protected]> Co-authored-by: Miyoung <[email protected]> Co-authored-by: Ameen Khan <[email protected]> Co-authored-by: Zhankui Lu <[email protected]> Co-authored-by: Xiaoguang Chen <[email protected]> Co-authored-by: Jonathan Guinegagne <[email protected]> Co-authored-by: Zhankui Lu <[email protected]> Co-authored-by: Yifei Zhu <[email protected]> Co-authored-by: Qingzi-Lan <[email protected]> * prepare release v2.72.3 * update development version to v2.72.4.dev0 * fix: fixes unnecessary session call while generating pipeline definition for lambda step (#2824) * feature: Add models_v2 under lineage context (#2800) * feature: enable python 3.9 (#2802) Co-authored-by: Ahsan Khan <[email protected]> * change: Update CHANGELOG.md (#2842) * fix: update pricing link (#2805) Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Ahsan Khan <[email protected]> Co-authored-by: Shreya Pandit <[email protected]> Co-authored-by: Basil Beirouti <[email protected]> Co-authored-by: Mufaddal Rohawala <[email protected]> Co-authored-by: Basil Beirouti <[email protected]> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Mohamed Ali Jamaoui <[email protected]> Co-authored-by: ci <ci> Co-authored-by: Jeniya Tabassum <[email protected]> Co-authored-by: sreedes <[email protected]> Co-authored-by: Navin Soni <[email protected]> Co-authored-by: Miyoung <[email protected]> Co-authored-by: Ameen Khan <[email protected]> Co-authored-by: Zhankui Lu <[email protected]> Co-authored-by: Navin Soni <[email protected]> Co-authored-by: Xiaoguang Chen <[email protected]> Co-authored-by: Jonathan Guinegagne <[email protected]> Co-authored-by: Zhankui Lu <[email protected]> Co-authored-by: Yifei Zhu <[email protected]> Co-authored-by: Qingzi-Lan <[email protected]> * doc: Document the available ExecutionVariables (#2807) * fix: Remove duplicate vertex/edge in query lineage (#2784) * feature: Support model pipelines in CreateModelStep (#2845) Co-authored-by: Payton Staub <[email protected]> * feature: support JsonGet/Join parameterization in tuning step Hyperparameters (#2833) * doc: Enhance smddp 1.2.2 doc (#2852) * feature: support checkpoint to be passed from estimator (#2849) Co-authored-by: marckarp <[email protected]> * fix: allow kms_key to be passed for processing step (#2779) * feature: Adds support for Serverless inference (#2831) * feature: Add support for SageMaker lineage queries in action (#2853) * feature: Adds Lineage queries in artifact, context and trial components (#2838) * feature: Add EMRStep support in Sagemaker pipeline (#2848) Co-authored-by: chenxy <[email protected]> * prepare release v2.73.0 * update development version to v2.73.1.dev0 * feature: Add support for SageMaker lineage queries context (#2830) * fix: support specifying a facet by its column index Currently the Clarify BiasConfig only accepts facet name. Actually Clarify analysis configuration supports both name and index. This commit adds the same support to BiasConfig. * doc: more documentation for serverless inference (#2859) * prepare release v2.74.0 * update development version to v2.74.1.dev0 * Add deprecation warning in Clarify DataConfig (#2847) * feature: Update instance types for integ test (#2881) * feature: Adds support for async inference (#2846) * fix: update to incorporate black v22, pin tox versions (#2889) Co-authored-by: Mufaddal Rohawala <[email protected]> * make black happy Co-authored-by: Mufaddal Rohawala <[email protected]> Co-authored-by: Basil Beirouti <[email protected]> Co-authored-by: Basil Beirouti <[email protected]> Co-authored-by: ci <ci> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Ahsan Khan <[email protected]> Co-authored-by: Mohamed Ali Jamaoui <[email protected]> Co-authored-by: Shreya Pandit <[email protected]> Co-authored-by: sreedes <[email protected]> Co-authored-by: Navin Soni <[email protected]> Co-authored-by: Miyoung <[email protected]> Co-authored-by: Jeniya Tabassum <[email protected]> Co-authored-by: Ameen Khan <[email protected]> Co-authored-by: Zhankui Lu <[email protected]> Co-authored-by: Xiaoguang Chen <[email protected]> Co-authored-by: Jonathan Guinegagne <[email protected]> Co-authored-by: Zhankui Lu <[email protected]> Co-authored-by: Yifei Zhu <[email protected]> Co-authored-by: Qingzi-Lan <[email protected]> Co-authored-by: Xinghan Chen <[email protected]> Co-authored-by: Navin Soni <[email protected]> Co-authored-by: Tulio Casagrande <[email protected]> Co-authored-by: jerrypeng7773 <[email protected]> Co-authored-by: marckarp <[email protected]> Co-authored-by: marckarp <[email protected]> Co-authored-by: jayatalr <[email protected]> Co-authored-by: bhaoz <[email protected]> Co-authored-by: Ethan Cheng <[email protected]> Co-authored-by: chenxy <[email protected]> Co-authored-by: Xiaoguang Chen <[email protected]> Co-authored-by: keerthanvasist <[email protected]> Co-authored-by: Mufaddal Rohawala <[email protected]> Co-authored-by: Shreya Pandit <[email protected]>
Issue #, if available:
Support for creating asynchronous endpoints with my own model and inference code #2619
Description of changes:
Add support for Asynchronous Inference:
sagemaker.async_inference.async_inference_config
as the configuration class for async inferencesagemaker.estimator
,sagemaker.model
,sagemaker.tensorflow.model
,sagemaker.session
to let users deploy async endpoint by passing the configuration object.sagemaker.predictor_async
as theAsyncPredictor
to make async invoke and handle response.sagemaker.exceptions
that will be raised during async inference.Detailed Design is in this doc
Testing done:
test/integ/test_async_inference
Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.General
Tests
unique_name_from_base
to create resource names in integ tests (if appropriate)By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.