-
Notifications
You must be signed in to change notification settings - Fork 1.2k
ConnectionClosedError #799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for using SageMaker, @Robbie-Palmer! SageMaker's InvokeEndpoint API has a 5MB limit on the size of incoming requests. I think you are hitting that limit with your 324x324x3 input. There is no corresponding limit on the response size, which is why you are able to receive a 13MB JSON response. Why is the request more than 5MB? The The SDK has a some other serializer/deserializer options, but they aren't automatically supported by the TensorFlow container, so using them will require a couple of steps. First, change your predictor code:
Second, you will need to change your endpoint code to support the numpy format. Specifically, you need to add numpy deserialization to your |
Hi @jesterhazy thank-you so much for your help and your really quick response. from sagemaker.tensorflow.model import TensorFlowModel
model = TensorFlowModel(model_data=f's3://{bucket}/model/{model_name}.tar.gz',
role=role,
entry_point='entry_point.py',
name=model_name)
predictor = model.deploy(initial_instance_count=1,
instance_type='ml.m4.xlarge',
endpoint_name=endpoint_name) Following your reply, I populated the entry point script with an implementation for the # I'm doing this
def __call__(self, stream, content_type=CONTENT_TYPE_NPY):
return np.load(BytesIO(stream))
# instead of this
def __call__(self, stream, content_type=CONTENT_TYPE_NPY):
try:
return np.load(BytesIO(stream.read()))
finally:
stream.close() Weirdly my predictor receives the result from the web server back as a StreamingBody so needs different deserialization code from the web-server. I realised that the default for the TensorFowModel is to use Python 2 and since the encoding of the numpy arrays is affected by Python version this could be the root of the issue, so I tried redeploying the model as below: from sagemaker.tensorflow.model import TensorFlowModel
model = TensorFlowModel(model_data=f's3://{bucket}/model/{model_name}.tar.gz',
role=role,
entry_point='entry_point.py',
py_version='py3',
name=model_name)
predictor = model.deploy(initial_instance_count=1,
instance_type='ml.m4.xlarge',
endpoint_name=endpoint_name) But I got an error saying there is no Python 3 image available for TensorFlow version 1.11.0, which from looking at the docs should be supported I've been looking for docs for how this all works together but I've been struggling to find information. Any hints to point me in the right direction would really be appreciated! So what I'm wondering about is:
Thanks! |
There are a few things going on here.
For the client side, you should be able to use the ones that are already in our SDK, but your code would need to look like this:
On the server side, your code might look like this:
|
Hi @jesterhazy Ah thank-you for the info! For point 1:
Is the case really that this old way of deploying models is being deprecated and won't support Python 3? For point 2: def output_fn(prediction, accepts):'score'
output = prediction.outputs['score']
shape = [dim.size for dim in output.tensor_shape.dim]
prediction_array = np.reshape(output.float_val, shape)
buffer = BytesIO()
np.save(buffer, prediction_array)
return buffer.getvalue() This has a huge performance benefit e.g. an example array of size [1, 280, 280, 3] took 4.6 secs for a response with JSON but with numpy serialization it takes only 0.4 seconds Also I've been able to increase the request size from around [1, 320, 320, 3] up to just below [1, 420, 420, 3]
I can see that you have already addressed this by increading the message size from 4MB to 2GB |
Hi @Robbie-Palmer, Regarding your questions:
Thanks |
What would the server side (e.g. entry_point='infefrence.py') code be for Tensorflow Serving Model ? |
@whatdhack you can find documentation about pre/post-processing with TFS at https://github.com/aws/sagemaker-tensorflow-serving-container/#prepost-processing |
Not clear to me from the above documentation how I would add numpy_deserializer on the server side corresponding to the npy_serializer on the client side (e.g. the corresponding part on the Sagemaker server for the following code ) in the inference.py file which is supposedly part of the archive that the endpoint uses.
|
here's an example of deserializing input from the docs: def input_handler(data, context):
""" Pre-process request input before it is sent to TensorFlow Serving REST API
Args:
data (obj): the request data, in format of dict or string
context (Context): an object containing request and configuration details
Returns:
(dict): a JSON-serializable dict that contains request body and headers
"""
if context.request_content_type == 'application/json':
# pass through json (assumes it's correctly formed)
d = data.read().decode('utf-8')
return d if len(d) else ''
if context.request_content_type == 'text/csv':
# very simple csv handler
return json.dumps({
'instances': [float(x) for x in data.read().decode('utf-8').split(',')]
})
raise ValueError('{{"error": "unsupported content type {}"}}'.format(
context.request_content_type or "unknown")) for numpy, you'd want to check for "application/x-npy" for the content type and then use something like |
Thanks. An example for application/x-npy is what I am looking for. |
Hi @whatdhack for me I just needed to do |
System Information
Describe the problem
I have deployed an image segmentation model to SageMaker (build locally) and gave it an endpoint.
It takes in a 3 channel image and returns a 6 channel segmented mask of the same dimensions
When I use the sagemaker python SDK to send my model data of size [1, 300, 300, 3] it takes ~4.7 seconds to respond
If I increase the size of the input image to [1, 320, 320, 3] it consistently succeeds taking ~5.0 seconds
But if increase the size of the input image to [1, 324, 324, 3] it consistently fails with the below error.
I at first assumed this was a timeout but my understanding is the models have 60 seconds to respond so this is not a problem.
I've been looking into if it is because of limits on the size of the data transfer.
The Invoke Endpoint Documentation says the body accepts binary data with a max length of 5,242,880
The numpy array I am sending is 1,166,400 bytes
The size of the numpy array the model produces when I run the 320x320 patch through the model locally is 2,332,800 bytes
It is hard to see the size of successful responses since I'm using this high-level sagemaker SDK, and measuring the size of the python objects massively overestimates due to the python object wrappers.
When I save the successful response from the 300x300 patch to a JSON file, it is ~13MB in size which far exceeds the 5.2MB limit which confuses me further, but it seems unlikely to be a data problem.
Any help as to why this might be occurring would be brilliant.
Minimal repro / logs
Nothing appears in the CloudWatch logs other than successful pings
I've listed what appear to be the key parts of the error message below:
Having the connection aborted, then tries a number of retries and ends with a ConnectionClosedError
Model setup with
predictor=sagemaker.tensorflow.model.TensorFlowPredictor(endpoint_name, sagemaker_session)
Below command succeeds (slicing to predict on an interesting section of the image)
Below command fails
The text was updated successfully, but these errors were encountered: