@@ -684,6 +684,99 @@ For more detailed explanations of the classes that this library provides for aut
684
684
- `API docs for HyperparameterTuner and parameter range classes <https://sagemaker.readthedocs.io/en/stable/tuner.html >`__
685
685
- `API docs for analytics classes <https://sagemaker.readthedocs.io/en/stable/analytics.html >`__
686
686
687
+ **********************************
688
+ SageMaker Asynchronous Inference
689
+ **********************************
690
+ Amazon SageMaker Asynchronous Inference is a new capability in SageMaker that queues incoming requests and processes them asynchronously.
691
+ This option is ideal for requests with large payload sizes up to 1GB, long processing times, and near real-time latency requirements.
692
+ Asynchronous Inference enables you to save on costs by autoscaling the instance count to zero when there are no requests to process,
693
+ so you only pay when your endpoint is processing requests. More information about
694
+ SageMaker Asynchronous Inference can be found in the `AWS documentation <https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference.html >`__.
695
+
696
+ To deploy asynchronous inference endpoint, you will need to create a ``AsyncInferenceConfig `` object.
697
+ If you create ``AsyncInferenceConfig `` without specifying its arguments, the default ``S3OutputPath `` will
698
+ be ``s3://sagemaker-{REGION}-{ACCOUNTID}/async-endpoint-outputs/{UNIQUE-JOB-NAME} ``. (example shown below):
699
+
700
+ .. code :: python
701
+
702
+ from sagemaker.async_inference import AsyncInferenceConfig
703
+
704
+ # Create an empty AsyncInferenceConfig object to use default values
705
+ async_config = new AsyncInferenceConfig()
706
+
707
+ Or you can specify configurations in ``AsyncInferenceConfig `` as you like. All of those configuration parameters
708
+ are optionally but if you don’t specify the ``output_path ``, Amazon SageMaker will use the default ``S3OutputPath ``
709
+ mentioned above (example shown below):
710
+
711
+ .. code :: python
712
+
713
+ # Specify S3OutputPath, MaxConcurrentInvocationsPerInstance and NotificationConfig in the async config object
714
+ async_config = new AsyncInferenceConfig(
715
+ output_path = " s3://{s3_bucket} /{bucket_prefix} /output" ,
716
+ max_concurrent_invocations_per_instance = 10 ,
717
+ notification_config = {
718
+ " SuccessTopic" : " arn:aws:sns:aws-region:account-id:topic-name" ,
719
+ " ErrorTopic" : " arn:aws:sns:aws-region:account-id:topic-name" ,
720
+ }
721
+ )
722
+
723
+ Then use the ``AsyncInferenceConfig `` in the estimator's ``deploy() `` method to deploy an asynchronous inference endpoint:
724
+
725
+ .. code :: python
726
+
727
+ # Deploys the model that was generated by fit() to a SageMaker asynchronous inference endpoint
728
+ async_predictor = estimator.deploy(async_inference_config = async_config)
729
+
730
+ After deployment is complete, it will return an ``AsyncPredictor `` object. To perform asynchronous inference, you first
731
+ need to upload data to S3 and then use the ``predict_async() `` method with the s3 URI as the input. It will return an
732
+ ``AsyncInferenceResponse `` object:
733
+
734
+ .. code :: python
735
+
736
+ # Upload data to S3 bucket then use that as input
737
+ async_response = async_predictor.predict_async(input_path = input_s3_path)
738
+
739
+ The Amazon SageMaker SDK also enables you to serialize the data and pass the payload data directly to the
740
+ ``predict_async() `` method. For this pattern of invocation, the Amazon SageMaker SDK will upload the data to an Amazon
741
+ S3 bucket under ``s3://sagemaker-{REGION}-{ACCOUNTID}/async-endpoint-inputs/ ``.
742
+
743
+ .. code :: python
744
+
745
+ # Serializes data and makes a prediction request to the SageMaker asynchronous endpoint
746
+ async_response = async_predictor.predict_async(data = data)
747
+
748
+ Then you can switch to other stuff and wait the inference to complete. After it is completed, you can check
749
+ the result using ``AsyncInferenceResponse ``:
750
+
751
+ .. code :: python
752
+
753
+ # Switch back to check the result
754
+ result = async_response.get_result()
755
+
756
+ Alternatively, if you would like to check periodically for result and return it when it has been generated in the
757
+ output Amazon S3 path, use the ``predict() `` method
758
+
759
+ .. code :: python
760
+
761
+ # Use predict() to wait for the result
762
+ response = async_predictor.predict(data = data)
763
+
764
+ # Or use Amazon S3 input path
765
+ response = async_predictor.predict(input_path = input_s3_path)
766
+
767
+ Clean up the endpoint and model if needed after inference:
768
+
769
+ .. code :: python
770
+
771
+ # Tears down the SageMaker endpoint and endpoint configuration
772
+ async_predictor.delete_endpoint()
773
+
774
+ # Deletes the SageMaker model
775
+ async_predictor.delete_model()
776
+
777
+ For more details about Asynchronous Inference,
778
+ see the API docs for `Asynchronous Inference <https://sagemaker.readthedocs.io/en/stable/api/inference/async_inference.html >`__
779
+
687
780
*******************************
688
781
SageMaker Serverless Inference
689
782
*******************************
0 commit comments