Skip to content

doc: more documentation for serverless inference #2859

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 20, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions doc/api/inference/serverless.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Serverless Inference
---------------------

This module contains classes related to Amazon Sagemaker Serverless Inference

.. automodule:: sagemaker.serverless.serverless_inference_config
:members:
:undoc-members:
:show-inheritance:
57 changes: 57 additions & 0 deletions doc/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -684,6 +684,63 @@ For more detailed explanations of the classes that this library provides for aut
- `API docs for HyperparameterTuner and parameter range classes <https://sagemaker.readthedocs.io/en/stable/tuner.html>`__
- `API docs for analytics classes <https://sagemaker.readthedocs.io/en/stable/analytics.html>`__

*******************************
SageMaker Serverless Inference
*******************************
Amazon SageMaker Serverless Inference enables you to easily deploy machine learning models for inference without having
to configure or manage the underlying infrastructure. After you trained a model, you can deploy it to Amazon Sagemaker
Serverless endpoint and then invoke the endpoint with the model to get inference results back. More information about
SageMaker Serverless Inference can be found in the `AWS documentation <https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html>`__.

To deploy serverless endpoint, you will need to create a ``ServerlessInferenceConfig``.
If you create ``ServerlessInferenceConfig`` without specifying its arguments, the default ``MemorySizeInMB`` will be **2048** and
the default ``MaxConcurrency`` will be **5** :

.. code:: python

from sagemaker.serverless import ServerlessInferenceConfig

# Create an empty ServerlessInferenceConfig object to use default values
serverless_config = new ServerlessInferenceConfig()

Or you can specify ``MemorySizeInMB`` and ``MaxConcurrency`` in ``ServerlessInferenceConfig`` (example shown below):

.. code:: python

# Specify MemorySizeInMB and MaxConcurrency in the serverless config object
serverless_config = new ServerlessInferenceConfig(
memory_size_in_mb=4096,
max_concurrency=10,
)

Then use the ``ServerlessInferenceConfig`` in the estimator's ``deploy()`` method to deploy a serverless endpoint:

.. code:: python

# Deploys the model that was generated by fit() to a SageMaker serverless endpoint
serverless_predictor = estimator.deploy(serverless_inference_config=serverless_config)

After deployment is complete, you can use predictor's ``predict()`` method to invoke the serverless endpoint just like
real-time endpoints:

.. code:: python

# Serializes data and makes a prediction request to the SageMaker serverless endpoint
response = serverless_predictor.predict(data)

Clean up the endpoint and model if needed after inference:

.. code:: python

# Tears down the SageMaker endpoint and endpoint configuration
serverless_predictor.delete_endpoint()

# Deletes the SageMaker model
serverless_predictor.delete_model()

For more details about ``ServerlessInferenceConfig``,
see the API docs for `Serverless Inference <https://sagemaker.readthedocs.io/en/stable/api/inference/serverless.html>`__

*************************
SageMaker Batch Transform
*************************
Expand Down