@@ -1226,6 +1226,28 @@ to configure or manage the underlying infrastructure. After you trained a model,
1226
1226
Serverless endpoint and then invoke the endpoint with the model to get inference results back. More information about
1227
1227
SageMaker Serverless Inference can be found in the `AWS documentation <https://docs.aws.amazon.com/sagemaker/latest/dg/serverless-endpoints.html >`__.
1228
1228
1229
+ For using SageMaker Serverless Inference, if you plan to use any of the SageMaker-provided container or Bring Your Own Container
1230
+ model, you will need to pass ``image_uri ``. An example to use ``image_uri `` for creating MXNet model:
1231
+
1232
+ .. code :: python
1233
+
1234
+ from sagemaker.mxnet import MXNetModel
1235
+ import sagemaker
1236
+
1237
+ role = sagemaker.get_execution_role()
1238
+
1239
+ # create MXNet Model Class
1240
+ mxnet_model = MXNetModel(
1241
+ model_data = " s3://my_bucket/pretrained_model/model.tar.gz" , # path to your trained sagemaker model
1242
+ role = role, # iam role with permissions to create an Endpoint
1243
+ entry_point = " inference.py" ,
1244
+ image_uri = " 763104351884.dkr.ecr.us-west-2.amazonaws.com/mxnet-inference:1.4.1-cpu-py3" # image wanted to use
1245
+ )
1246
+
1247
+ For more Amazon SageMaker provided algorithms and containers image paths, please check this page: `Amazon SageMaker provided
1248
+ algorithms and Deep Learning Containers <https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-algo-docker-registry-paths.html> `_.
1249
+ After creating model using ``image_uri ``, you can then follow the steps below to create serverless endpoint.
1250
+
1229
1251
To deploy serverless endpoint, you will need to create a ``ServerlessInferenceConfig ``.
1230
1252
If you create ``ServerlessInferenceConfig `` without specifying its arguments, the default ``MemorySizeInMB `` will be **2048 ** and
1231
1253
the default ``MaxConcurrency `` will be **5 ** :
@@ -1235,14 +1257,14 @@ the default ``MaxConcurrency`` will be **5** :
1235
1257
from sagemaker.serverless import ServerlessInferenceConfig
1236
1258
1237
1259
# Create an empty ServerlessInferenceConfig object to use default values
1238
- serverless_config = new ServerlessInferenceConfig()
1260
+ serverless_config = ServerlessInferenceConfig()
1239
1261
1240
1262
Or you can specify ``MemorySizeInMB `` and ``MaxConcurrency `` in ``ServerlessInferenceConfig `` (example shown below):
1241
1263
1242
1264
.. code :: python
1243
1265
1244
1266
# Specify MemorySizeInMB and MaxConcurrency in the serverless config object
1245
- serverless_config = new ServerlessInferenceConfig(
1267
+ serverless_config = ServerlessInferenceConfig(
1246
1268
memory_size_in_mb = 4096 ,
1247
1269
max_concurrency = 10 ,
1248
1270
)
@@ -1254,6 +1276,14 @@ Then use the ``ServerlessInferenceConfig`` in the estimator's ``deploy()`` metho
1254
1276
# Deploys the model that was generated by fit() to a SageMaker serverless endpoint
1255
1277
serverless_predictor = estimator.deploy(serverless_inference_config = serverless_config)
1256
1278
1279
+ Or directly using model's ``deploy() `` method to deploy a serverless endpoint:
1280
+
1281
+ .. code :: python
1282
+
1283
+ # Deploys the model to a SageMaker serverless endpoint
1284
+ serverless_predictor = model.deploy(serverless_inference_config = serverless_config)
1285
+
1286
+
1257
1287
After deployment is complete, you can use predictor's ``predict() `` method to invoke the serverless endpoint just like
1258
1288
real-time endpoints:
1259
1289
0 commit comments