Skip to content
This repository was archived by the owner on May 23, 2024. It is now read-only.

Commit 8ee6c5f

Browse files
author
Chuyang Deng
committed
Merge branch 'master' of github.com:ChuyangDeng/sagemaker-tensorflow-serving-container
2 parents 8462353 + ddbcfb3 commit 8ee6c5f

File tree

1 file changed

+52
-0
lines changed

1 file changed

+52
-0
lines changed

README.md

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@ For notebook examples, see: [Amazon SageMaker Examples](https://github.com/awsla
4242
3. [Running the tests](#running-the-tests)
4343
4. [Pre/Post-Processing](#pre/post-processing)
4444
5. [Deploying a TensorFlow Serving Model](#deploying-a-tensorflow-serving-model)
45+
6. [Deploying to Multi-Model Endpoint](#deploying-to-multi-model-endpoint)
4546

4647
## Getting Started
4748

@@ -611,6 +612,57 @@ SAGEMAKER_TFS_NUM_BATCH_THREADS="16"
611612
SAGEMAKER_TFS_MAX_ENQUEUED_BATCHES="10000"
612613
```
613614

615+
## Deploying to Multi-Model Endpoint
616+
617+
SageMaker TensorFlow Serving container (version 1.5.0 and 2.1.0, CPU) now supports Multi-Model Endpoint. With this feature, you can deploy different models (not just different versions of a model) to a single endpoint.
618+
To deploy a Multi-Model endpoint with TFS container, please start the container with environment variable ``SAGEMAKER_MULTI_MODEL=True``.
619+
620+
### Multi-Model Interfaces
621+
We provide four different interfaces for user to interact with a Multi-Model Mode container:
622+
623+
+---------------------+---------------------------------+---------------------------------------------+
624+
| Functionality | Request | Response/Actions |
625+
+---------------------+---------------------------------+---------------------------------------------+
626+
| List A Single Model | GET /models/{model_name} | Information about the specified model |
627+
+---------------------+---------------------------------+---------------------------------------------+
628+
| List All Models | GET /models | List of Information about all loaded models |
629+
+---------------------+---------------------------------+---------------------------------------------+
630+
| | POST /models | Load model with "model_name" from |
631+
| | data = { | specified url |
632+
| Load A Model | "model_name": <model-name>, | |
633+
| | "url": <path to model data> | |
634+
| | } | |
635+
+---------------------+---------------------------------+---------------------------------------------+
636+
| Make Invocations | POST /models/{model_name}/invoke| Return inference result from |
637+
| | data = <invocation payload> | the specified model |
638+
+---------------------+---------------------------------+---------------------------------------------+
639+
| Unload A Model | DELETE /models/{model_name} | Unload the specified model |
640+
+---------------------+---------------------------------+---------------------------------------------+
641+
642+
### Maximum Number of Models
643+
Also please note the environment variable ``SAGEMAKER_SAFE_PORT_RANGE`` will limit the number of models that can be loaded to the endpoint at the same time.
644+
Only 90% of the ports will be utilized and each loaded model will be allocated with 2 ports (one for REST API and the other for GRPC).
645+
For example, if the ``SAGEMAKER_SAFE_PORT_RANGE`` is between 9000 to 9999, the maximum number of models that can be loaded to the endpoint at the same time would be 499 ((9999 - 9000) * 0.9 / 2).
646+
647+
### Using Multi-Model Endpoint with Pre/Post-Processing
648+
Multi-Model Endpoint can be used together with Pre/Post-Processing. However, please note that in Multi-Model mode, the path of ``inference.py`` is ``/opt/ml/models/code`` instead of ``/opt/ml/model/code``.
649+
Also, all loaded models will share the same ``inference.py`` to handle invocation requests. An example of the directory structure of Multi-Model Endpoint and Pre/Post-Processing would look like this:
650+
651+
model1
652+
|--[model_version_number]
653+
|--variables
654+
|--saved_model.pb
655+
model2
656+
|--[model_version_number]
657+
|--assets
658+
|--variables
659+
|--saved_model.pb
660+
code
661+
|--lib
662+
|--external_module
663+
|--inference.py
664+
|--requirements.txt
665+
614666
## Contributing
615667

616668
Please read [CONTRIBUTING.md](https://github.com/aws/sagemaker-tensorflow-serving-container/blob/master/CONTRIBUTING.md)

0 commit comments

Comments
 (0)