You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on May 23, 2024. It is now read-only.
SageMaker TensorFlow Serving container (version 1.5.0 and 2.1.0, CPU) now supports Multi-Model Endpoint. With this feature, you can deploy different models (not just different versions of a model) to a single endpoint.
618
+
To deploy a Multi-Model endpoint with TFS container, please start the container with environment variable ``SAGEMAKER_MULTI_MODEL=True``.
619
+
620
+
### Multi-Model Interfaces
621
+
We provide four different interfaces for user to interact with a Multi-Model Mode container:
Also please note the environment variable ``SAGEMAKER_SAFE_PORT_RANGE`` will limit the number of models that can be loaded to the endpoint at the same time.
644
+
Only 90% of the ports will be utilized and each loaded model will be allocated with 2 ports (one for REST API and the other for GRPC).
645
+
For example, if the ``SAGEMAKER_SAFE_PORT_RANGE`` is between 9000 to 9999, the maximum number of models that can be loaded to the endpoint at the same time would be 499 ((9999 - 9000) * 0.9 / 2).
646
+
647
+
### Using Multi-Model Endpoint with Pre/Post-Processing
648
+
Multi-Model Endpoint can be used together with Pre/Post-Processing. However, please note that in Multi-Model mode, the path of ``inference.py`` is ``/opt/ml/models/code`` instead of ``/opt/ml/model/code``.
649
+
Also, all loaded models will share the same ``inference.py`` to handle invocation requests. An example of the directory structure of Multi-Model Endpoint and Pre/Post-Processing would look like this:
0 commit comments