diff --git a/doc/api/inference/serverless.rst b/doc/api/inference/serverless.rst new file mode 100644 index 0000000000..d338efd7be --- /dev/null +++ b/doc/api/inference/serverless.rst @@ -0,0 +1,9 @@ +Serverless Inference +--------------------- + +This module contains classes related to Amazon Sagemaker Serverless Inference + +.. automodule:: sagemaker.serverless.serverless_inference_config + :members: + :undoc-members: + :show-inheritance: diff --git a/doc/overview.rst b/doc/overview.rst index 02290ff94c..bdd964c864 100644 --- a/doc/overview.rst +++ b/doc/overview.rst @@ -684,6 +684,63 @@ For more detailed explanations of the classes that this library provides for aut - `API docs for HyperparameterTuner and parameter range classes `__ - `API docs for analytics classes `__ +******************************* +SageMaker Serverless Inference +******************************* +Amazon SageMaker Serverless Inference enables you to easily deploy machine learning models for inference without having +to configure or manage the underlying infrastructure. After you trained a model, you can deploy it to Amazon Sagemaker +Serverless endpoint and then invoke the endpoint with the model to get inference results back. More information about +SageMaker Serverless Inference can be found in the `AWS documentation `__. + +To deploy serverless endpoint, you will need to create a ``ServerlessInferenceConfig``. +If you create ``ServerlessInferenceConfig`` without specifying its arguments, the default ``MemorySizeInMB`` will be **2048** and +the default ``MaxConcurrency`` will be **5** : + +.. code:: python + + from sagemaker.serverless import ServerlessInferenceConfig + + # Create an empty ServerlessInferenceConfig object to use default values + serverless_config = new ServerlessInferenceConfig() + +Or you can specify ``MemorySizeInMB`` and ``MaxConcurrency`` in ``ServerlessInferenceConfig`` (example shown below): + +.. code:: python + + # Specify MemorySizeInMB and MaxConcurrency in the serverless config object + serverless_config = new ServerlessInferenceConfig( + memory_size_in_mb=4096, + max_concurrency=10, + ) + +Then use the ``ServerlessInferenceConfig`` in the estimator's ``deploy()`` method to deploy a serverless endpoint: + +.. code:: python + + # Deploys the model that was generated by fit() to a SageMaker serverless endpoint + serverless_predictor = estimator.deploy(serverless_inference_config=serverless_config) + +After deployment is complete, you can use predictor's ``predict()`` method to invoke the serverless endpoint just like +real-time endpoints: + +.. code:: python + + # Serializes data and makes a prediction request to the SageMaker serverless endpoint + response = serverless_predictor.predict(data) + +Clean up the endpoint and model if needed after inference: + +.. code:: python + + # Tears down the SageMaker endpoint and endpoint configuration + serverless_predictor.delete_endpoint() + + # Deletes the SageMaker model + serverless_predictor.delete_model() + +For more details about ``ServerlessInferenceConfig``, +see the API docs for `Serverless Inference `__ + ************************* SageMaker Batch Transform *************************