Fix: Don't load default model in MME mode #130

nikhil-sk · 2022-10-31T07:21:25Z

Issue #, if available:

Description of changes:

In MME mode, no default model should be loaded. Currently, the torchserve command attempts to load a default 'model' from the path /opt/ml/models.
This change removes the commandline arg based on whether the container is running in MME mode or not:
Failure log

    2022-09-07T21:03:56.608+02:00   2022-09-07T19:03:55,758 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - model_name: model, batchSize: 1
    2022-09-07T21:03:56.608+02:00   2022-09-07T19:03:55,808 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Backend worker process died.
    2022-09-07T21:03:56.608+02:00   2022-09-07T19:03:55,808 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - Traceback (most recent call last):
    2022-09-07T21:03:56.608+02:00   2022-09-07T19:03:55,809 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/ts/model_service_worker.py", line 210, in <module>
    2022-09-07T21:03:56.608+02:00   2022-09-07T19:03:55,809 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - worker.run_server()
    2022-09-07T21:03:56.608+02:00   2022-09-07T19:03:55,809 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/ts/model_service_worker.py", line 181, in run_server
    2022-09-07T21:03:56.608+02:00   2022-09-07T19:03:55,809 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - self.handle_connection(cl_socket)
    2022-09-07T21:03:56.608+02:00   2022-09-07T19:03:55,809 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/ts/model_service_worker.py", line 139, in handle_connection
    2022-09-07T21:03:56.608+02:00   2022-09-07T19:03:55,809 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - service, result, code = self.load_model(msg)
    2022-09-07T21:03:56.608+02:00   2022-09-07T19:03:55,809 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/ts/model_service_worker.py", line 104, in load_model
    2022-09-07T21:03:56.608+02:00   2022-09-07T19:03:55,809 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - service = model_loader.load(
    2022-09-07T21:03:56.608+02:00   2022-09-07T19:03:55,809 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/ts/model_loader.py", line 151, in load
    2022-09-07T21:03:56.608+02:00   2022-09-07T19:03:55,809 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - initialize_fn(service.context)
    2022-09-07T21:03:56.608+02:00   2022-09-07T19:03:55,810 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/sagemaker_pytorch_serving_container/handler_service.py", line 51, in initialize
    2022-09-07T21:03:56.608+02:00   2022-09-07T19:03:55,810 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - super().initialize(context)
    2022-09-07T21:03:56.608+02:00   2022-09-07T19:03:55,810 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/sagemaker_inference/default_handler_service.py", line 66, in initialize
    2022-09-07T21:03:56.608+02:00   2022-09-07T19:03:55,810 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - self._service.validate_and_initialize(model_dir=model_dir, context=context)
    2022-09-07T21:03:56.608+02:00   2022-09-07T19:03:55,810 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/sagemaker_inference/transformer.py", line 178, in validate_and_initialize
    2022-09-07T21:03:56.608+02:00   2022-09-07T19:03:55,810 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - self._model = self._run_handler_function(self._model_fn, *(model_dir,))
    2022-09-07T21:03:56.608+02:00   2022-09-07T19:03:55,810 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/sagemaker_inference/transformer.py", line 266, in _run_handler_function
    2022-09-07T21:03:56.608+02:00   2022-09-07T19:03:55,810 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - result = func(*argv)
    2022-09-07T21:03:56.608+02:00   2022-09-07T19:03:55,811 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - File "/opt/conda/lib/python3.8/site-packages/sagemaker_pytorch_serving_container/default_pytorch_inference_handler.py", line 73, in default_model_fn
    2022-09-07T21:03:56.608+02:00   2022-09-07T19:03:55,811 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - raise ValueError(
    2022-09-07T21:03:56.608+02:00   2022-09-07T19:03:55,811 [INFO ] W-9000-model_1.0-stdout MODEL_LOG - ValueError: Exactly one .pth or .pt file is required for PyTorch models: []
    2022-09-07T21:03:56.608+02:00   2022-09-07T19:03:55,817 [INFO ] epollEventLoopGroup-5-1 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED
    2022-09-07T21:03:56.608+02:00   2022-09-07T19:03:55,818 [WARN ] W-9000-model_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: model, error: Worker died.
    2022-09-07T19:03:55,818 [WARN] W-9000-model_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: model, error: Worker died.
...

Fixed log
(No default model loaded when torchserve starts)

Metrics report format: prometheus
--
Enable metrics API: true
Workflow Store: /
Model config: N/A
2022-10-31T07:12:55,633 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2022-10-31T07:12:55,651 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: EpollServerSocketChannel.
2022-10-31T07:12:55,696 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://0.0.0.0:8080
2022-10-31T07:12:55,697 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: EpollServerSocketChannel.
2022-10-31T07:12:55,698 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
Model server started.
2022-10-31T07:12:55,914 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0\|#Level:Host\|#hostname:container-1.local,timestamp:1667200375
2022-10-31T07:12:55,914 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:40.57777786254883\|#Level:Host\|#hostname:container-1.local,timestamp:1667200375
2022-10-31T07:12:55,914 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:11.410484313964844\|#Level:Host\|#hostname:container-1.local,timestamp:1667200375
2022-10-31T07:12:55,914 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:21.9\|#Level:Host\|#hostname:container-1.local,timestamp:1667200375
2022-10-31T07:12:55,914 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:6150.69921875\|#Level:Host\|#hostname:container-1.local,timestamp:1667200375
2022-10-31T07:12:55,915 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:1175.19921875\|#Level:Host\|#hostname:container-1.local,timestamp:1667200375
2022-10-31T07:12:55,915 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:19.3\|#Level:Host\|#hostname:container-1.local,timestamp:1667200375
2022-10-31T07:12:57,847 [INFO ] pool-2-thread-1 ACCESS_LOG - /169.254.178.2:35152 "GET /ping HTTP/1.1" 200 13
2022-10-31T07:12:57,847 [INFO ] pool-2-thread-1 TS_METRICS - Requests2XX.Count:1\|#Level:Host\|#hostname:container-1.local,timestamp:1667200377
2022-10-31T07:12:57,866 [INFO ] epollEventLoopGroup-3-1 ACCESS_LOG - /169.254.178.2:35152 "GET /models HTTP/1.1" 200 2
2022-10-31T07:12:57,866 [INFO ] epollEventLoopGroup-3-1 TS_METRICS - Requests2XX.Count:1\|#Level:Host\|#hostname:container-1.local,timestamp:1667200377
2022-10-31T07:13:02,752 [INFO ] pool-2-thread-1 ACCESS_LOG - /169.254.178.2:35152 "GET /ping HTTP/1.1" 200 1
2022-10-31T07:13:02,752 [INFO ] pool-2-thread-1 TS_METRICS - Requests2XX.Count:1\|#Level:Host\|#hostname:container-1.local,timestamp:1667200377
2022-10-31T07:13:07,751 [INFO ] pool-2-thread-1 ACCESS_LOG - /169.254.178.2:35152 "GET /ping HTTP/1.1" 200 0
2022-10-31T07:13:07,751 [INFO ] pool-2-thread-1 TS_METRICS - Requests2XX.Count:1\|#Level:Host\|#hostname:container-1.local,timestamp:1667200377
2022-10-31T07:13:12,751 [INFO ] pool-2-thread-1 ACCESS_LOG - /169.254.178.2:35152 "GET /ping HTTP/1.1" 200 0
2022-10-31T07:13:12,751 [INFO ] pool-2-thread-1 TS_METRICS - Requests2XX.Count:1\|#Level:Host\|#hostname:container-1.local,timestamp:1667200377
2022-10-31T07:13:17,751 [INFO ] pool-2-thread-1 ACCESS_LOG - /169.254.178.2:35152 "GET /ping HTTP/1.1" 200 0
2022-10-31T07:13:17,751 [INFO ] pool-2-thread-1 TS_METRICS - Requests2XX.Count:1\|#Level:Host\|#hostname:container-1.local,timestamp:1667200377
2022-10-31T07:13:22,751 [INFO ] pool-2-thread-1 ACCESS_LOG - /169.254.178.2:35152 "GET /ping HTTP/1.1" 200 0
2022-10-31T07:13:22,751 [INFO ] pool-2-thread-1 TS_METRICS - Requests2XX.Count:1\|#Level:Host\|#hostname:container-1.local,timestamp:1667200377
2022-10-31T07:13:27,751 [INFO ] pool-2-thread-1 ACCESS_LOG - /169.254.178.2:35152 "GET /ping HTTP/1.1" 200 0
2022-10-31T07:13:27,751 [INFO ] pool-2-thread-1 TS_METRICS - Requests2XX.Count:1\|#Level:Host\|#hostname:container-1.local,timestamp:1667200377
2022-10-31T07:13:32,751 [INFO ] pool-2-thread-1 ACCESS_LOG - /169.254.178.2:35152 "GET /ping HTTP/1.1" 200 0
2022-10-31T07:13:32,751 [INFO ] pool-2-thread-1 TS_METRICS - Requests2XX.Count:1\|#Level:Host\|#hostname:container-1.local,timestamp:1667200377
2022-10-31T07:13:37,751 [INFO ] pool-2-thread-1 ACCESS_LOG - /169.254.178.2:35152 "GET /ping HTTP/1.1" 200 0
2022-10-31T07:13:37,751 [INFO ] pool-2-thread-1 TS_METRICS - Requests2XX.Count:1\|#Level:Host\|#hostname:container-1.local,timestamp:1667200377
2022-10-31T07:13:42,751 [INFO ] pool-2-thread-1 ACCESS_LOG - /169.254.178.2:35152 "GET /ping HTTP/1.1" 200 0

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

sagemaker-bot · 2022-10-31T07:29:43Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: 69eeea9
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2022-10-31T07:59:19Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: 8b210dd
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2022-10-31T08:17:16Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: 8514322
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2022-10-31T08:45:59Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: 70b1278
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2022-10-31T09:04:34Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: b67f7fa
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2022-10-31T09:29:55Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: 17094ed
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2022-10-31T10:46:23Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: 260288f
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2022-10-31T13:36:13Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: bb2945f
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

nikhil-sk · 2022-10-31T16:19:32Z

tox.ini

@@ -70,6 +70,7 @@ deps =
    six
    future
    pyyaml
+    protobuf == 3.19.6


This is currently required, as otherwise, SageMaker imports fail with error on py37 only:

import sagemaker.amazon.common .tox/py37/lib/python3.7/site-packages/sagemaker/amazon/common.py:23: in <module> from sagemaker.amazon.record_pb2 import Record .tox/py37/lib/python3.7/site-packages/sagemaker/amazon/record_pb2.py:52: in <module> file=DESCRIPTOR, .tox/py37/lib/python3.7/site-packages/google/protobuf/descriptor.py:560: in __new__ _message.Message._CheckCalledFromGeneratedFile() E TypeError: Descriptors cannot not be created directly. E If this call came from a _pb2.py file, your generated code is out of date and must be regenerated with protoc >= 3.19.0. E If you cannot immediately regenerate your protos, some other possible workarounds are: E 1. Downgrade the protobuf package to 3.20.x or lower. E 2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower). E E More information: https://developers.google.com/protocol-buffers/docs/news/2022-05-06#python-updates

Upgrading SageMaker version did not resolve the issue, so currently we need to pin the version, and consider a complete upgrade of dependencies in a separate PR.

setup.py

sagemaker-bot · 2022-11-01T02:16:55Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: ce931fd
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2022-11-01T05:01:52Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-pytorch-inference-toolkit-pr
Commit ID: 604e65b
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

src/sagemaker_pytorch_serving_container/torchserve.py

…-inference-toolkit into mme_fix

Fix: Don't load default model in MME mode

69eeea9

Remove whitespace from blank line

8b210dd

Fix flake8 issue in other file

8514322

Remove --model specification from MME test

70b1278

Trigger build

b67f7fa

Upgrade sagemaker and torch package

17094ed

nikhil-sk added 3 commits October 31, 2022 09:34

Unpin sagemaker in tox

dbfab99

Pin protobuf version

587efc9

Update protobuf version for python3.6

260288f

Trigger build

bb2945f

nikhil-sk commented Oct 31, 2022

View reviewed changes

rohithkrn reviewed Oct 31, 2022

View reviewed changes

setup.py Outdated Show resolved Hide resolved

Revert dependency changes, keep only test dep changes

ce931fd

Trigger build

604e65b

rohithkrn reviewed Nov 1, 2022

View reviewed changes

src/sagemaker_pytorch_serving_container/torchserve.py Outdated Show resolved Hide resolved

maaquib previously approved these changes Nov 1, 2022

View reviewed changes

nikhil-sk added 2 commits November 3, 2022 21:46

Revert version change when setting env variables

ba3c453

Merge branch 'mme_fix' of https://github.com/nskool/sagemaker-pytorch…

237a0e1

…-inference-toolkit into mme_fix

nikhil-sk dismissed maaquib’s stale review via 237a0e1 November 3, 2022 21:47

Trigger build

b024a42

Trigger build

e1a9745

rohithkrn approved these changes Nov 7, 2022

View reviewed changes

maaquib approved these changes Nov 7, 2022

View reviewed changes

nikhil-sk merged commit 1daa4c1 into aws:master Nov 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix: Don't load default model in MME mode #130

Fix: Don't load default model in MME mode #130

Uh oh!

nikhil-sk commented Oct 31, 2022

Uh oh!

sagemaker-bot commented Oct 31, 2022

Uh oh!

sagemaker-bot commented Oct 31, 2022

Uh oh!

sagemaker-bot commented Oct 31, 2022

Uh oh!

sagemaker-bot commented Oct 31, 2022

Uh oh!

sagemaker-bot commented Oct 31, 2022

Uh oh!

sagemaker-bot commented Oct 31, 2022

Uh oh!

sagemaker-bot commented Oct 31, 2022

Uh oh!

sagemaker-bot commented Oct 31, 2022

Uh oh!

nikhil-sk Oct 31, 2022 •

edited

Loading

Uh oh!

Uh oh!

sagemaker-bot commented Nov 1, 2022

Uh oh!

sagemaker-bot commented Nov 1, 2022

Uh oh!

Uh oh!

Uh oh!

Fix: Don't load default model in MME mode #130

Fix: Don't load default model in MME mode #130

Uh oh!

Conversation

nikhil-sk commented Oct 31, 2022

Uh oh!

sagemaker-bot commented Oct 31, 2022

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Oct 31, 2022

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Oct 31, 2022

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Oct 31, 2022

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Oct 31, 2022

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Oct 31, 2022

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Oct 31, 2022

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Oct 31, 2022

AWS CodeBuild CI Report

Uh oh!

nikhil-sk Oct 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sagemaker-bot commented Nov 1, 2022

AWS CodeBuild CI Report

Uh oh!

sagemaker-bot commented Nov 1, 2022

AWS CodeBuild CI Report

Uh oh!

Uh oh!

Uh oh!

nikhil-sk Oct 31, 2022 •

edited

Loading