Skip to content
This repository was archived by the owner on May 23, 2024. It is now read-only.

Commit ca29ccd

Browse files
chuyang-dengChuyang Dengajaykarpurmetrizable
authored
multi-model-endpoint support (#140)
* remove tfs model server start if mme enabled * remove grpc client, combine InvocationResource with PythonServiceResource * on_get and on_delete * load models and invocations * rename pre post processing test file * fix memory exhaust error message * clean up * remove extra model data * remove tensorflow files * remove tensorflow files from all containers * test against 2.1.0 instead of 2.0.0 Co-authored-by: Eric Johnson <[email protected]> * clean up files if model is not loaded successfully * no need to cleanup config files is they are not created Co-authored-by: Chuyang Deng <[email protected]> Co-authored-by: Ajay Karpur <[email protected]> Co-authored-by: Eric Johnson <[email protected]>
1 parent afb7fb1 commit ca29ccd

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+803
-1138
lines changed

buildspec.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ phases:
2020

2121
# run local tests
2222
- tox -e py37 -- test/integration/local --framework-version 1.15
23-
- tox -e py37 -- test/integration/local --framework-version 2.0
23+
- tox -e py37 -- test/integration/local --framework-version 2.1
2424

2525
# push docker images to ECR
2626
- |
@@ -32,7 +32,7 @@ phases:
3232
- |
3333
if is-release-build; then
3434
tox -e py37 -- -n 8 test/integration/sagemaker/test_tfs.py --versions 1.15.0
35-
tox -e py37 -- -n 8 test/integration/sagemaker/test_tfs.py --versions 2.0.0
35+
tox -e py37 -- -n 8 test/integration/sagemaker/test_tfs.py --versions 2.1.0
3636
fi
3737
3838
# write deployment details to file

docker/1.11/Dockerfile.cpu

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,10 +23,6 @@ RUN \
2323

2424
COPY ./ /
2525

26-
# put tensorflow library (with only error_codes) to python dist-packages
27-
WORKDIR /
28-
RUN mv /sagemaker/tensorflow/ /usr/local/lib/python3.*/dist-packages/
29-
3026
ARG TFS_SHORT_VERSION
3127
ENV SAGEMAKER_TFS_VERSION "${TFS_SHORT_VERSION}"
3228
ENV PATH "$PATH:/sagemaker"

docker/1.11/Dockerfile.eia

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,6 @@ RUN \
2020

2121
COPY ./ /
2222

23-
# put tensorflow library (with only error_codes) to python dist-packages
24-
WORKDIR /
25-
RUN mv /sagemaker/tensorflow/ /usr/local/lib/python3.*/dist-packages/
26-
2723
RUN mv amazonei_tensorflow_model_server /usr/bin/tensorflow_model_server && \
2824
chmod +x /usr/bin/tensorflow_model_server
2925

docker/1.11/Dockerfile.gpu

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -62,10 +62,6 @@ RUN \
6262

6363
COPY ./ /
6464

65-
# put tensorflow library (with only error_codes) to python dist-packages
66-
WORKDIR /
67-
RUN mv /sagemaker/tensorflow/ /usr/local/lib/python3.*/dist-packages/
68-
6965
ARG TFS_SHORT_VERSION
7066
ENV SAGEMAKER_TFS_VERSION "${TFS_SHORT_VERSION}"
7167
ENV PATH "$PATH:/sagemaker"

docker/1.12/Dockerfile.cpu

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,10 +23,6 @@ RUN \
2323

2424
COPY ./ /
2525

26-
# put tensorflow library (with only error_codes) to python dist-packages
27-
WORKDIR /
28-
RUN mv /sagemaker/tensorflow/ /usr/local/lib/python3.*/dist-packages/
29-
3026
ARG TFS_SHORT_VERSION
3127
ENV SAGEMAKER_TFS_VERSION "${TFS_SHORT_VERSION}"
3228
ENV PATH "$PATH:/sagemaker"

docker/1.12/Dockerfile.eia

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,6 @@ RUN \
2020

2121
COPY ./ /
2222

23-
# put tensorflow library (with only error_codes) to python dist-packages
24-
WORKDIR /
25-
RUN mv /sagemaker/tensorflow/ /usr/local/lib/python3.*/dist-packages/
26-
2723
RUN mv amazonei_tensorflow_model_server /usr/bin/tensorflow_model_server && \
2824
chmod +x /usr/bin/tensorflow_model_server
2925

docker/1.12/Dockerfile.gpu

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -62,10 +62,6 @@ RUN \
6262

6363
COPY ./ /
6464

65-
# put tensorflow library (with only error_codes) to python dist-packages
66-
WORKDIR /
67-
RUN mv /sagemaker/tensorflow/ /usr/local/lib/python3.*/dist-packages/
68-
6965

7066
ARG TFS_SHORT_VERSION
7167
ENV SAGEMAKER_TFS_VERSION "${TFS_SHORT_VERSION}"

docker/1.13/Dockerfile.cpu

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -47,10 +47,6 @@ RUN ${PIP} install -U --no-cache-dir \
4747

4848
COPY ./ /
4949

50-
# put tensorflow library (with only error_codes) to python dist-packages
51-
WORKDIR /
52-
RUN mv /sagemaker/tensorflow/ /usr/local/lib/python3.*/dist-packages/
53-
5450
# Some TF tools expect a "python" binary
5551
RUN ln -s $(which ${PYTHON}) /usr/local/bin/python \
5652
&& ln -s /usr/local/bin/pip3 /usr/bin/pip

docker/1.13/Dockerfile.eia

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,9 +33,5 @@ RUN ${PIP} install --no-cache-dir \
3333

3434
COPY ./ /
3535

36-
# put tensorflow library (with only error_codes) to python dist-packages
37-
WORKDIR /
38-
RUN mv /sagemaker/tensorflow/ /usr/local/lib/python3.*/dist-packages/
39-
4036
RUN mv amazonei_tensorflow_model_server /usr/bin/tensorflow_model_server && \
4137
chmod +x /usr/bin/tensorflow_model_server

docker/1.13/Dockerfile.gpu

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -116,10 +116,6 @@ RUN ${PIP} install -U --no-cache-dir \
116116

117117
COPY ./ /
118118

119-
# put tensorflow library (with only error_codes) to python dist-packages
120-
WORKDIR /
121-
RUN mv /sagemaker/tensorflow/ /usr/local/lib/python3.*/site-packages/
122-
123119
# Expose gRPC and REST port
124120
EXPOSE 8500 8501
125121

docker/1.14/Dockerfile.cpu

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -47,10 +47,6 @@ RUN ${PIP} install --no-cache-dir \
4747

4848
COPY ./ /
4949

50-
# put tensorflow library (with only error_codes) to python dist-packages
51-
WORKDIR /
52-
RUN mv /sagemaker/tensorflow/ /usr/local/lib/python3.*/dist-packages/
53-
5450
# Some TF tools expect a "python" binary
5551
RUN ln -s $(which ${PYTHON}) /usr/local/bin/python
5652

docker/1.14/Dockerfile.eia

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -80,10 +80,6 @@ RUN pip install --no-cache-dir \
8080

8181
COPY sagemaker /sagemaker
8282

83-
# put tensorflow library (with only error_codes) to python dist-packages
84-
WORKDIR /
85-
RUN mv /sagemaker/tensorflow/ /usr/local/lib/python3.*/site-packages/
86-
8783
RUN wget https://amazonei-tools.s3.amazonaws.com/v${HEALTH_CHECK_VERSION}/ei_tools_${HEALTH_CHECK_VERSION}.tar.gz -O /opt/ei_tools_${HEALTH_CHECK_VERSION}.tar.gz \
8884
&& tar -xvf /opt/ei_tools_${HEALTH_CHECK_VERSION}.tar.gz -C /opt/ \
8985
&& rm -rf /opt/ei_tools_${HEALTH_CHECK_VERSION}.tar.gz \

docker/1.14/Dockerfile.gpu

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -114,10 +114,6 @@ RUN ${PIP} install -U --no-cache-dir \
114114

115115
COPY ./ /
116116

117-
# put tensorflow library (with only error_codes) to python dist-packages
118-
WORKDIR /
119-
RUN mv /sagemaker/tensorflow/ /usr/local/lib/python3.*/site-packages/
120-
121117
# Expose gRPC and REST port
122118
EXPOSE 8500 8501
123119

docker/1.15/Dockerfile.cpu

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ LABEL maintainer="Amazon AI"
44
# Specify LABEL for inference pipelines to use SAGEMAKER_BIND_TO_PORT
55
# https://docs.aws.amazon.com/sagemaker/latest/dg/inference-pipeline-real-time.html
66
LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true
7+
LABEL com.amazonaws.sagemaker.capabilities.multi-models=true
78

89
# Add arguments to achieve the version, python and url
910
ARG PYTHON=python3
@@ -70,9 +71,6 @@ COPY sagemaker /sagemaker
7071

7172
WORKDIR /
7273

73-
# put tensorflow library (with only error_codes) to python dist-packages
74-
RUN mv /sagemaker/tensorflow/ /usr/local/lib/python3.*/dist-packages/
75-
7674
# Some TF tools expect a "python" binary
7775
RUN ln -s $(which ${PYTHON}) /usr/local/bin/python \
7876
&& ln -s /usr/local/bin/pip3 /usr/bin/pip

docker/1.15/Dockerfile.eia

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -70,11 +70,6 @@ RUN ${PIP} install --no-cache-dir \
7070

7171
COPY sagemaker /sagemaker
7272

73-
WORKDIR /
74-
75-
# put tensorflow library (with only error_codes) to python dist-packages
76-
RUN mv /sagemaker/tensorflow/ /usr/local/lib/python3.*/dist-packages/
77-
7873
# Some TF tools expect a "python" binary
7974
RUN ln -s $(which ${PYTHON}) /usr/local/bin/python \
8075
&& ln -s /usr/local/bin/pip3 /usr/bin/pip

docker/1.15/Dockerfile.gpu

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -110,10 +110,6 @@ COPY sagemaker /sagemaker
110110
RUN curl ${TF_MODEL_SERVER_SOURCE} -o /usr/bin/tensorflow_model_server \
111111
&& chmod 555 /usr/bin/tensorflow_model_server
112112

113-
WORKDIR /
114-
# put tensorflow library (with only error_codes) to python dist-packages
115-
RUN mv /sagemaker/tensorflow/ /usr/local/lib/python3.*/dist-packages/
116-
117113
# Expose gRPC and REST port
118114
EXPOSE 8500 8501
119115

docker/2.0/Dockerfile.cpu

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -62,10 +62,6 @@ RUN ${PIP} install --no-cache-dir \
6262

6363
COPY ./sagemaker /sagemaker
6464

65-
# put tensorflow library (with only error_codes) to python dist-packages
66-
WORKDIR /
67-
RUN mv /sagemaker/tensorflow/ /usr/local/lib/python3.*/dist-packages/
68-
6965
# Some TF tools expect a "python" binary
7066
RUN ln -s $(which ${PYTHON}) /usr/local/bin/python
7167

docker/2.0/Dockerfile.eia

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -68,11 +68,6 @@ RUN ${PIP} install --no-cache-dir \
6868

6969
COPY sagemaker /sagemaker
7070

71-
WORKDIR /
72-
73-
# put tensorflow library (with only error_codes) to python dist-packages
74-
RUN mv /sagemaker/tensorflow/ /usr/local/lib/python3.*/dist-packages/
75-
7671
# Some TF tools expect a "python" binary
7772
RUN ln -s $(which ${PYTHON}) /usr/local/bin/python \
7873
&& ln -s /usr/local/bin/pip3 /usr/bin/pip

docker/2.0/Dockerfile.gpu

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -105,10 +105,6 @@ COPY ./sagemaker /sagemaker
105105
RUN curl $TFS_URL -o /usr/bin/tensorflow_model_server \
106106
&& chmod 555 /usr/bin/tensorflow_model_server
107107

108-
# put tensorflow library (with only error_codes) to python dist-packages
109-
WORKDIR /
110-
RUN mv /sagemaker/tensorflow/ /usr/local/lib/python3.*/dist-packages/
111-
112108
# Expose gRPC and REST port
113109
EXPOSE 8500 8501
114110

docker/2.1/Dockerfile.cpu

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ FROM ubuntu:18.04
22

33
LABEL maintainer="Amazon AI"
44
LABEL com.amazonaws.sagemaker.capabilities.accept-bind-to-port=true
5+
LABEL com.amazonaws.sagemaker.capabilities.multi-models=true
56

67
ARG PYTHON=python3
78
ARG PIP=pip3
@@ -62,13 +63,6 @@ RUN ${PIP} install --no-cache-dir \
6263

6364
COPY ./sagemaker /sagemaker
6465

65-
# put tensorflow library (with only error_codes) to python dist-packages
66-
WORKDIR /
67-
RUN cd /usr/local/lib/python3.*/dist-packages/ \
68-
&& mv /sagemaker/tensorflow-2.1 ./tensorflow \
69-
# Delete the remaining tensorflow folder to avoid confusing python import
70-
&& rm -rf /sagemaker/tensorflow
71-
7266
# Some TF tools expect a "python" binary
7367
RUN ln -s $(which ${PYTHON}) /usr/local/bin/python
7468

docker/2.1/Dockerfile.gpu

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -107,13 +107,6 @@ COPY ./sagemaker /sagemaker
107107
RUN curl $TFS_URL -o /usr/bin/tensorflow_model_server \
108108
&& chmod 555 /usr/bin/tensorflow_model_server
109109

110-
# put tensorflow library (with only error_codes) to python dist-packages
111-
WORKDIR /
112-
RUN cd /usr/local/lib/python3.*/dist-packages/ \
113-
&& mv /sagemaker/tensorflow-2.1 ./tensorflow \
114-
# Delete the remaining tensorflow folder to avoid confusing python import
115-
&& rm -rf /sagemaker/tensorflow
116-
117110
# Expose gRPC and REST port
118111
EXPOSE 8500 8501
119112

docker/build_artifacts/sagemaker/multi_model_utils.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
# ANY KIND, either express or implied. See the License for the specific
1212
# language governing permissions and limitations under the License.
1313
import fcntl
14+
import signal
1415
import time
1516
from contextlib import contextmanager
1617

@@ -31,6 +32,19 @@ def lock(path=DEFAULT_LOCK_FILE):
3132
fcntl.lockf(fd, fcntl.LOCK_UN)
3233

3334

35+
@contextmanager
36+
def timeout(seconds=60):
37+
def _raise_timeout_error(signum, frame):
38+
raise Exception(408, 'Timed out after {} seconds'.format(seconds))
39+
40+
try:
41+
signal.signal(signal.SIGALRM, _raise_timeout_error)
42+
signal.alarm(seconds)
43+
yield
44+
finally:
45+
signal.alarm(0)
46+
47+
3448
class MultiModelException(Exception):
3549
def __init__(self, code, msg):
3650
Exception.__init__(self, code, msg)

docker/build_artifacts/sagemaker/nginx.conf.template

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -51,10 +51,6 @@ http {
5151
%FORWARD_INVOCATION_REQUESTS%;
5252
}
5353

54-
location ~ ^/models/(.*)/invoke {
55-
js_content invocations;
56-
}
57-
5854
location /models {
5955
proxy_pass http://gunicorn_upstream/models;
6056
}

0 commit comments

Comments
 (0)