502 bad gateway? #1485

Xixiong-Guo · 2020-05-11T02:41:55Z

Following https://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using-amazon-sagemaker/, I tried to save two different model (sentiment analysis and a simple regression model) trained by tensorflow+keras, and uploaded to Sagemaker, but encountered the same 502 error, which is seldom reported here or stackoverflow. Any thoughts?

Body_review = ','.join([str(val) for val in padded_pred]).encode('utf-8')

response = runtime.invoke_endpoint(EndpointName=predictor.endpoint,

ContentType = 'text/csv',

Body = Body_review)

An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (502) from model with message " <title>502 Bad Gateway</title>

502 Bad Gateway

nginx/1.16.1

I searched the CloudWatch as found attached:

2020/05/10 15:53:27 [error] 35#35: *187 connect() failed (111: Connection refused) while connecting to upstream, client: 10.32.0.1, server: , request: "POST /invocations HTTP/1.1", subrequest: "/v1/models/export:predict", upstream: "http://127.0.0.1:27001/v1/models/export:predict", host: "model.aws.local:8080"

I tried another regression model (trained outside sagemaker, saved and loaded to S3 and Sagemaker, following https://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using-amazon-sagemaker/ )

Still the same issue when using the predictor:

from sagemaker.predictor import csv_serializer

predictor.content_type = 'text/csv'

predictor.serializer = csv_serializer

Y_pred = predictor.predict(test.tolist())

Error:

--------------------------------------------------------------------------- ModelError Traceback (most recent call last) in () 4 predictor.serializer = csv_serializer 5 ----> 6 Y_pred = predictor.predict(test) ~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/predictor.py in predict(self, data, initial_args, target_model) 108 109 request_args = self._create_request_args(data, initial_args, target_model) --> 110 response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args) 111 return self._handle_response(response) 112 ~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs) 314 "%s() only accepts keyword arguments." % py_operation_name) 315 # The "self" in this scope is referring to the BaseClient. --> 316 return self._make_api_call(operation_name, kwargs) 317 318 _api_call.name = str(py_operation_name) ~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params) 624 error_code = parsed_response.get("Error", {}).get("Code") 625 error_class = self.exceptions.from_code(error_code) --> 626 raise error_class(parsed_response, operation_name) 627 else: 628 return parsed_response ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (502) from model with message " <title>502 Bad Gateway</title>

502 Bad Gateway

nginx/1.16.1 ".

Xixiong-Guo · 2020-05-11T02:44:51Z

Actually the 502 error trouble was there when running predictor.predict(test) before deploy. But my model performed well in my own machine and saved exactly the same way as https://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using-amazon-sagemaker/

chuyang-deng · 2020-05-15T19:55:51Z

Hi @Xixiong-Guo, if you are following the example from https://aws.amazon.com/blogs/machine-learning/deploy-trained-keras-or-tensorflow-models-using-amazon-sagemaker/ , it's likely that you've used the wrong Model class from step 5.

For framework versions 1.11 and above, we've split the tensorflow container into training and serving. And for deploying the model, please use this class instead: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/serving.py#L121

Xixiong-Guo · 2020-05-15T20:50:34Z

Hi, @ChuyangDeng Thanks for your reply.

I did encountered this problem, and after I found this reference (https://sagemaker.readthedocs.io/en/stable/using_tf.html#deploying-directly-from-model-artifacts). I've changed to

from sagemaker.tensorflow.serving import Model
model = Model(model_data='s3://sagemaker-us-east-1-665159495798/model/model.tar.gz', role=role)

I guess this should not be the reason for that 502 error now? Thanks!

chuyang-deng · 2020-05-15T21:43:12Z

Hi @Xixiong-Guo,

How did you tar the model? When you tar your model, please make sure to use the -C option so that the tar.gz does not add an extra layer to your folder. When it extracts, the outmost layer should be the version number, something like

$ ls -al 00000123 # version number (not model name)
total 24
drwxr-xr-x .
drwx------ ..
drwxr-xr-x assets
-rw-r--r-- saved_model.pb
drwxr-xr-x variables

Xixiong-Guo · 2020-05-15T23:14:07Z

Hi @ChuyangDeng
I used:
"import tarfile
with tarfile.open('model.tar.gz', mode='w:gz') as archive:
archive.add('export', recursive=True)"

My tar.gz looks like:
model.tar.gz\export\Servo\1\saved_model.pb
model.tar.gz\export\Servo\1\variables\

You mean the directory should like:
model.tar.gz\1\saved_model.pb
model.tar.gz\1\variables\ ? (If the version number is 1)

Thanks.

chuyang-deng · 2020-05-16T04:59:28Z

Yes, SageMaker expects model to be extracted directly under "opt/ml/<model_name>/" directory inside the container. The sagemaker-tensorflow-serving container will look for model version directly under "<model_name>/". So your tar structure should be:

model.tar.gz\1\saved_model.pb
model.tar.gz\1\variables...

Xixiong-Guo · 2020-05-16T14:12:24Z

Hi @ChuyangDeng
Unfortunately it is still not working. The error is still the same as previous. The tar structure is attached.

Code and errors are as follows:

import boto3, re
from sagemaker import get_execution_role

role = get_execution_role()
import tarfile
with tarfile.open('model.tar.gz', mode='w:gz') as archive:
archive.add('1', recursive=True)

import sagemaker
sagemaker_session = sagemaker.Session()
inputs = sagemaker_session.upload_data(path='model.tar.gz', key_prefix='model')

from sagemaker.tensorflow.serving import Model
model = Model(model_data='s3://sagemaker-us-east-1-665159495798/model/model.tar.gz', role=role)
predictor = model.deploy(initial_instance_count=1, instance_type='ml.c5.xlarge')

result = predictor.predict(input)

ModelError Traceback (most recent call last)
in ()
----> 1 result = predictor.predict(input)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/tensorflow/serving.py in predict(self, data, initial_args)
116 args["CustomAttributes"] = self._model_attributes
117
--> 118 return super(Predictor, self).predict(data, args)
119
120

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/predictor.py in predict(self, data, initial_args, target_model)
108
109 request_args = self._create_request_args(data, initial_args, target_model)
--> 110 response = self.sagemaker_session.sagemaker_runtime_client.invoke_endpoint(**request_args)
111 return self._handle_response(response)
112

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
314 "%s() only accepts keyword arguments." % py_operation_name)
315 # The "self" in this scope is referring to the BaseClient.
--> 316 return self._make_api_call(operation_name, kwargs)
317
318 _api_call.name = str(py_operation_name)

~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
624 error_code = parsed_response.get("Error", {}).get("Code")
625 error_class = self.exceptions.from_code(error_code)
--> 626 raise error_class(parsed_response, operation_name)
627 else:
628 return parsed_response

ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (502) from model with message "

<title>502 Bad Gateway</title>

502 Bad Gateway

nginx/1.16.1 ". See https://us-east-1.console.aws.amazon.com/cloudwatch/home?region=us-east-1#logEventViewer:group=/aws/sagemaker/Endpoints/sagemaker-tensorflow-serving-2020-05-16-13-45-23-880 in account 665159495798 for more information.

Sbrikky · 2020-05-18T08:17:53Z

I've been having the same issue after following the same examples. I've also checked me TAR and use the serving model.

The cloudwatch log is as follows from the moment I invoke the endpoint until it goes back to the regular pinging. (I used container_log_level = logging.DEBUG))

2020-05-18 08:13:49.621487: F external/org_tensorflow/tensorflow/core/util/tensor_format.h:426] Check failed: index >= 0 && index < num_total_dims Invalid index from the dimension: 3, 0, C

2020/05/18 08:13:49 [error] 18#18: *96 upstream prematurely closed connection while reading response header from upstream, client: 10.32.0.2, server: , request: "POST /invocations HTTP/1.1", subrequest: "/v1/models/model:predict", upstream: "http://127.0.0.1:27001/v1/models/model:predict", host: "model.aws.local:8080"

2020/05/18 08:13:49 [warn] 18#18: *96 upstream server temporarily disabled while reading response header from upstream, client: 10.32.0.2, server: , request: "POST /invocations HTTP/1.1", subrequest: "/v1/models/model:predict", upstream: "http://127.0.0.1:27001/v1/models/model:predict", host: "model.aws.local:8080"

10.32.0.2 - - [18/May/2020:08:13:49 +0000] "POST /invocations HTTP/1.1" 502 157 "-" "AHC/2.0"

WARNING:main:unexpected tensorflow serving exit (status: 6). restarting.

INFO:main:tensorflow version info:

TensorFlow ModelServer: 2.1.0-rc1+dev.sha.075ffcf

TensorFlow Library: 2.1.0

INFO:main:tensorflow serving command: tensorflow_model_server --port=27000 --rest_api_port=27001 --model_config_file=/sagemaker/model-config.cfg --max_num_load_retries=0

INFO:main:started tensorflow serving (pid: 131)

2020-05-18 08:13:50.045247: I tensorflow_serving/model_servers/server_core.cc:462] Adding/updating models.

2020-05-18 08:13:50.045284: I tensorflow_serving/model_servers/server_core.cc:573] (Re-)adding model: model

2020-05-18 08:13:50.145608: I tensorflow_serving/util/retrier.cc:46] Retrying of Reserving resources for servable: {name: model version: 1} exhausted max_num_retries: 0

2020-05-18 08:13:50.145645: I tensorflow_serving/core/basic_manager.cc:739] Successfully reserved resources to load servable {name: model version: 1}

2020-05-18 08:13:50.145657: I tensorflow_serving/core/loader_harness.cc:66] Approving load for servable version {name: model version: 1}

2020-05-18 08:13:50.145670: I tensorflow_serving/core/loader_harness.cc:74] Loading servable version {name: model version: 1}

2020-05-18 08:13:50.145704: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:31] Reading SavedModel from: /opt/ml/model/1

2020-05-18 08:13:50.154042: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:54] Reading meta graph with tags { serve }

2020-05-18 08:13:50.154072: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:264] Reading SavedModel debug info (if present) from: /opt/ml/model/1

2020-05-18 08:13:50.155009: I external/org_tensorflow/tensorflow/core/common_runtime/process_util.cc:147] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.

2020-05-18 08:13:50.194113: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:203] Restoring SavedModel bundle.

2020-05-18 08:13:50.252623: I external/org_tensorflow/tensorflow/cc/saved_model/loader.cc:333] SavedModel load for tags { serve }; Status: success: OK. Took 106920 microseconds.

2020-05-18 08:13:50.254566: I tensorflow_serving/servables/tensorflow/saved_model_warmup.cc:105] No warmup data file found at /opt/ml/model/1/assets.extra/tf_serving_warmup_requests

2020-05-18 08:13:50.255778: I tensorflow_serving/util/retrier.cc:46] Retrying of Loading servable: {name: model version: 1} exhausted max_num_retries: 0

2020-05-18 08:13:50.255798: I tensorflow_serving/core/loader_harness.cc:87] Successfully loaded servable version {name: model version: 1}

2020-05-18 08:13:50.257993: I tensorflow_serving/model_servers/server.cc:362] Running gRPC ModelServer at 0.0.0.0:27000 ...

[warn] getaddrinfo: address family for nodename not supported

2020-05-18 08:13:50.259143: I tensorflow_serving/model_servers/server.cc:382] Exporting HTTP/REST API at:localhost:27001 ...

[evhttp_server.cc : 238] NET_LOG: Entering the event loop ...

Sbrikky · 2020-05-18T09:15:57Z

For me this ended up being an issue with the shape of the input. I was uploading an individual sample, but the endpoint expects a batch, so I needed to make my input one layer deeper (As described here). Could this be happening for you as well, @Xixiong-Guo?

Xixiong-Guo · 2020-05-18T13:33:55Z

Hi @Sbrikky , did you encounter the same 502 issue?
I have no idea if it is caused by shape of the input. I can make prediction outside of Sagemaker in Keras when a list representing an embedded sentence is considered as input. But failed inside Sagemaker.

Sbrikky · 2020-05-18T14:40:28Z

@Xixiong-Guo Yes, I had the exact same error in my notebook as you posted so I didn't bother posting it again.
If you look at the first line of my cloudwatch log you see it says:

2020-05-18 08:13:49.621487: F external/org_tensorflow/tensorflow/core/util/tensor_format.h:426] Check failed: index >= 0 && index < num_total_dims Invalid index from the dimension: 3, 0, C

This suggested that maybe there was something in the shape of the request. Why on earth it ends up throwing this as a 502, I have no clue.
For me I had to put my input into another list. So instead of an array with all my pixels values, I had a list of 1 element, and that 1 element was my array with all my pixel values. So instead predict(input) I had to do predict([input.tolist()])

Xixiong-Guo · 2020-05-18T21:24:54Z

Hi @Sbrikky I got it. In your case, is there any difference in terms of the error info, when you tried both predict(input) and predict([input.tolist()])?

Sbrikky · 2020-05-19T07:07:59Z

When I use predict([input.tolist()]) it works and I get a prediction back. No 502.

chuyang-deng · 2020-05-19T17:21:50Z

Hi @Xixiong-Guo , looks like you are using csv_seralizer and note here (https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/predictor.py#L325) that the serializer will try to serialize your input row by row delimited by "," if you are using a python list: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/predictor.py#L363

MohGhaziAlZeyadi · 2020-06-12T17:32:03Z

Hi all,
I am having the same problem with error 502 as below:
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (502) from model with message "

<title>502 Bad Gateway</title>

502 Bad Gateway

nginx/1.16.1

alsulke · 2020-07-07T12:10:20Z

For me this ended up being an issue with the directory structure of the saved model.
As per the latest, Sagemaker expects model to be extracted directly under model_name/version/..

So, Your tar structure should should be :
model.tar.gz\export\1\saved_model.pb
model.tar.gz\export\1\variables\

chuyang-deng added the type: question label May 15, 2020

bveeramani closed this as completed May 20, 2021

aws locked and limited conversation to collaborators May 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

502 bad gateway? #1485

502 bad gateway? #1485

Xixiong-Guo commented May 11, 2020

Xixiong-Guo commented May 11, 2020

chuyang-deng commented May 15, 2020

Xixiong-Guo commented May 15, 2020

chuyang-deng commented May 15, 2020

Xixiong-Guo commented May 15, 2020

chuyang-deng commented May 16, 2020

Xixiong-Guo commented May 16, 2020

Sbrikky commented May 18, 2020 •

edited

Loading

Sbrikky commented May 18, 2020

Xixiong-Guo commented May 18, 2020

Sbrikky commented May 18, 2020 •

edited

Loading

Xixiong-Guo commented May 18, 2020

Sbrikky commented May 19, 2020

chuyang-deng commented May 19, 2020

MohGhaziAlZeyadi commented Jun 12, 2020

alsulke commented Jul 7, 2020

This issue was moved to a discussion.

This issue was moved to a discussion.

502 bad gateway? #1485

502 bad gateway? #1485

Comments

Xixiong-Guo commented May 11, 2020

502 Bad Gateway

502 Bad Gateway

Xixiong-Guo commented May 11, 2020

chuyang-deng commented May 15, 2020

Xixiong-Guo commented May 15, 2020

chuyang-deng commented May 15, 2020

Xixiong-Guo commented May 15, 2020

chuyang-deng commented May 16, 2020

Xixiong-Guo commented May 16, 2020

502 Bad Gateway

Sbrikky commented May 18, 2020 • edited Loading

Sbrikky commented May 18, 2020

Xixiong-Guo commented May 18, 2020

Sbrikky commented May 18, 2020 • edited Loading

Xixiong-Guo commented May 18, 2020

Sbrikky commented May 19, 2020

chuyang-deng commented May 19, 2020

MohGhaziAlZeyadi commented Jun 12, 2020

502 Bad Gateway

alsulke commented Jul 7, 2020

This issue was moved to a discussion.

Sbrikky commented May 18, 2020 •

edited

Loading

Sbrikky commented May 18, 2020 •

edited

Loading