@@ -684,6 +684,96 @@ For more detailed explanations of the classes that this library provides for aut
684
684
- `API docs for HyperparameterTuner and parameter range classes <https://sagemaker.readthedocs.io/en/stable/tuner.html >`__
685
685
- `API docs for analytics classes <https://sagemaker.readthedocs.io/en/stable/analytics.html >`__
686
686
687
+ **********************************
688
+ SageMaker Asynchronous Inference
689
+ **********************************
690
+ Amazon SageMaker Asynchronous Inference is a new capability in SageMaker that queues incoming requests and processes them asynchronously.
691
+ This option is ideal for requests with large payload sizes up to 1GB, long processing times, and near real-time latency requirements.
692
+ Asynchronous Inference enables you to save on costs by autoscaling the instance count to zero when there are no requests to process,
693
+ so you only pay when your endpoint is processing requests. More information about
694
+ SageMaker Serverless Inference can be found in the `AWS documentation <https://docs.aws.amazon.com/sagemaker/latest/dg/async-inference.html >`__.
695
+
696
+ To deploy asynchronous endpoint, you will need to create a ``AsyncInferenceConfig `` object.
697
+ If you create ``AsyncInferenceConfig `` without specifying its arguments, the default ``S3OutputPath `` will
698
+ be ``s3://sagemaker-{REGION}-{ACCOUNTID}/async-output/{UNIQUE-JOB-NAME} ``. (example shown below):
699
+
700
+ .. code :: python
701
+
702
+ from sagemaker.async_inference import AsyncInferenceConfig
703
+
704
+ # Create an empty AsyncInferenceConfig object to use default values
705
+ async_config = new AsyncInferenceConfig()
706
+
707
+ Or you can specify configurations in ``AsyncInferenceConfig `` as you like (example shown below):
708
+
709
+ .. code :: python
710
+
711
+ # Specify S3OutputPath, MaxConcurrentInvocationsPerInstance and NotificationConfig in the async config object
712
+ async_config = new AsyncInferenceConfig(
713
+ output_path = " s3://{s3_bucket} /{bucket_prefix} /output" ,
714
+ max_concurrent_invocations_per_instance = 10 ,
715
+ notification_config = {
716
+ " SuccessTopic" : " arn:aws:sns:aws-region:account-id:topic-name" ,
717
+ " ErrorTopic" : " arn:aws:sns:aws-region:account-id:topic-name" ,
718
+ }
719
+ )
720
+
721
+ Then use the ``AsyncInferenceConfig `` in the estimator's ``deploy() `` method to deploy an asynchronous endpoint:
722
+
723
+ .. code :: python
724
+
725
+ # Deploys the model that was generated by fit() to a SageMaker asynchronous endpoint
726
+ async_predictor = estimator.deploy(async_inference_config = async_config)
727
+
728
+ After deployment is complete, it will return an ``AsyncPredictor ``. You can use it to perform asynchronous inference
729
+ by using ``predict_async() `` and then get the result in the future. For input data, you can upload data to S3 bucket
730
+ and use that:
731
+
732
+ .. code :: python
733
+
734
+ # Upload data to S3 bucket then use that as input
735
+ async_response = async_predictor.predict_async(input_path = input_s3_path)
736
+
737
+ Or you can serialize data and use it directly just like real-time inference. This option will let Amazon SageMaker SDK
738
+ upload the data to Amazon S3 bucket under ``s3://sagemaker-{REGION}-{ACCOUNTID}/async-input/ ``.
739
+
740
+ .. code :: python
741
+
742
+ # Serializes data and makes a prediction request to the SageMaker asynchronous endpoint
743
+ async_response = async_predictor.predict_async(data = data)
744
+
745
+ Then you can switch to other stuff and wait the inference to complete. After it completed, you can check
746
+ the result then:
747
+
748
+ .. code :: python
749
+
750
+ # Switch back to check the result
751
+ result = async_response.get_result()
752
+
753
+ If you want to wait the result at the first place, you can use ``predict() `` method. It will check the result
754
+ periodically and return the result when it appears in the output Amazon S3 path:
755
+
756
+ .. code :: python
757
+
758
+ # Use predict() to wait for the result
759
+ response = async_predictor.predict(data = data)
760
+
761
+ # Or use Amazon S3 input path
762
+ response = async_predictor.predict(input_path = input_s3_path)
763
+
764
+ Clean up the endpoint and model if needed after inference:
765
+
766
+ .. code :: python
767
+
768
+ # Tears down the SageMaker endpoint and endpoint configuration
769
+ async_predictor.delete_endpoint()
770
+
771
+ # Deletes the SageMaker model
772
+ async_predictor.delete_model()
773
+
774
+ For more details about Asynchronous Inference,
775
+ see the API docs for `Asynchronous Inference <https://sagemaker.readthedocs.io/en/stable/api/inference/async_inference.html >`__
776
+
687
777
*******************************
688
778
SageMaker Serverless Inference
689
779
*******************************
0 commit comments