Skip to content

ConnectionClosedError #799

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Robbie-Palmer opened this issue May 15, 2019 · 11 comments
Closed

ConnectionClosedError #799

Robbie-Palmer opened this issue May 15, 2019 · 11 comments

Comments

@Robbie-Palmer
Copy link

System Information

  • Framework (e.g. TensorFlow) / Algorithm (e.g. KMeans): Tensorflow / U-Net/Segnet
  • Framework Version: 1.10.0
  • Python Version: 3.6
  • CPU or GPU: CPU
  • Python SDK Version: 1.19.0
  • Are you using a custom image: No

Describe the problem

I have deployed an image segmentation model to SageMaker (build locally) and gave it an endpoint.
It takes in a 3 channel image and returns a 6 channel segmented mask of the same dimensions

When I use the sagemaker python SDK to send my model data of size [1, 300, 300, 3] it takes ~4.7 seconds to respond
If I increase the size of the input image to [1, 320, 320, 3] it consistently succeeds taking ~5.0 seconds
But if increase the size of the input image to [1, 324, 324, 3] it consistently fails with the below error.

ConnectionClosedError: Connection was closed before we received a valid response from endpoint URL: "https://runtime.sagemaker.eu-west-1.amazonaws.com/endpoints/{endpoint-name}/invocations".

I at first assumed this was a timeout but my understanding is the models have 60 seconds to respond so this is not a problem.

I've been looking into if it is because of limits on the size of the data transfer.
The Invoke Endpoint Documentation says the body accepts binary data with a max length of 5,242,880
The numpy array I am sending is 1,166,400 bytes
The size of the numpy array the model produces when I run the 320x320 patch through the model locally is 2,332,800 bytes

It is hard to see the size of successful responses since I'm using this high-level sagemaker SDK, and measuring the size of the python objects massively overestimates due to the python object wrappers.
When I save the successful response from the 300x300 patch to a JSON file, it is ~13MB in size which far exceeds the 5.2MB limit which confuses me further, but it seems unlikely to be a data problem.

Any help as to why this might be occurring would be brilliant.

Minimal repro / logs

Nothing appears in the CloudWatch logs other than successful pings

I've listed what appear to be the key parts of the error message below:
Having the connection aborted, then tries a number of retries and ends with a ConnectionClosedError

~/anaconda3/envs/tensorflow_p36/lib/python3.6/ssl.py in write(self, data)
    641         """
--> 642         return self._sslobj.write(data)
    643 

ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))

During handling of the above exception, another exception occurred:

ConnectionClosedError                     Traceback (most recent call last)
~/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/botocore/httpsession.py in send(self, request)
    287                 error=e,
    288                 request=request,
--> 289                 endpoint_url=request.url
    290             )
    291         except Exception as e:

ConnectionClosedError: Connection was closed before we received a valid response from endpoint URL: "https://runtime.sagemaker.eu-west-1.amazonaws.com/endpoints/{endpoint-name}/invocations".

Model setup with predictor=sagemaker.tensorflow.model.TensorFlowPredictor(endpoint_name, sagemaker_session)

Below command succeeds (slicing to predict on an interesting section of the image)

  • sagemaker_response = predictor.predict(img[:, 300:620, 180:500, :])

Below command fails

  • sagemaker_response = predictor.predict(normalized_img[:, 300:624, 180:504, :])
@jesterhazy
Copy link
Contributor

Thanks for using SageMaker, @Robbie-Palmer!

SageMaker's InvokeEndpoint API has a 5MB limit on the size of incoming requests. I think you are hitting that limit with your 324x324x3 input. There is no corresponding limit on the response size, which is why you are able to receive a 13MB JSON response.

Why is the request more than 5MB? The sagemaker.tensorflow.model.TensorFlowPredictor object needs to convert your input into data that can be sent and received over HTTP. By default, it uses JSON for this, but JSON is not very efficient for numeric data. Consider that a float32 number is 4 bytes in binary, but might be a string like "0.123456789" -- 11 bytes.

The SDK has a some other serializer/deserializer options, but they aren't automatically supported by the TensorFlow container, so using them will require a couple of steps. First, change your predictor code:

from sagemaker.predictor import RealTimePredictor, npy_serializer, numpy_deserializer
predictor = RealTimePredictor(endpoint_name, sagemaker_session, 
                              npy_serializer, numpy_deserializer)

# then use it the same way you are now 
response = predictor.predict(img[:, 300:620, 180:500, :])

Second, you will need to change your endpoint code to support the numpy format. Specifically, you need to add numpy deserialization to your input_fn and numpy serialization to your output_fn (the inverse of the client side). You can use the implementation of the client-side npy_serializer, numpy_deserializer functions to see how that would be done.

@Robbie-Palmer
Copy link
Author

Hi @jesterhazy thank-you so much for your help and your really quick response.
It's good to understand what is going on! Though I haven't been able to get this working yet as I am struggling a bit with the serialisation and deserialisation.
There seems to be different behaviour on the client and server side which is causing difficulties.
To give an overview of what I am doing; I'm deploying my model and endpoint by uploading the model to S3, then using the TensorFlowModel class like below:

from sagemaker.tensorflow.model import TensorFlowModel
model = TensorFlowModel(model_data=f's3://{bucket}/model/{model_name}.tar.gz',
                        role=role,
                        entry_point='entry_point.py',
                        name=model_name)
predictor = model.deploy(initial_instance_count=1,
                         instance_type='ml.m4.xlarge',
                         endpoint_name=endpoint_name)

Following your reply, I populated the entry point script with an implementation for the input_fn and output_fn.
I used the numpy serialization logic found in sagemaker.predictor but the deserializer is expecting a botocore.response.StreamingBody instance while on the web-server it is receiving a string.
If I change the deserialisation logic to treat this as a string of bytes instead of a stream then the model runs which suggests this is the format in which it is receiving the data.

# I'm doing this
def __call__(self, stream, content_type=CONTENT_TYPE_NPY):
    return np.load(BytesIO(stream))

# instead of this
def __call__(self, stream, content_type=CONTENT_TYPE_NPY):
    try:
        return np.load(BytesIO(stream.read()))
    finally:
        stream.close()

Weirdly my predictor receives the result from the web server back as a StreamingBody so needs different deserialization code from the web-server.
But if I try using the numpy_deserializer from sagemaker.predictor the depickling fails.
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa7 in position 1: ordinal not in range(128) You may need to pass the encoding= option to numpy.load
The other encoding types of latin1 and bytes also fail.
When I look at the RealTimePredictor it's not doing anything more complicated than sending a HTTP request using urllib3 and passing the response body into the deserialiser, so I assume on the web-server there must be some additional logic that takes the output of the output_fn and wraps it in a StreamingBody instance?
I don't know how to take this object and convert it back into a numpy array.

I realised that the default for the TensorFowModel is to use Python 2 and since the encoding of the numpy arrays is affected by Python version this could be the root of the issue, so I tried redeploying the model as below:

from sagemaker.tensorflow.model import TensorFlowModel
model = TensorFlowModel(model_data=f's3://{bucket}/model/{model_name}.tar.gz',
                        role=role,
                        entry_point='entry_point.py',
                        py_version='py3',
                        name=model_name)
predictor = model.deploy(initial_instance_count=1,
                         instance_type='ml.m4.xlarge',
                         endpoint_name=endpoint_name)

But I got an error saying there is no Python 3 image available for TensorFlow version 1.11.0, which from looking at the docs should be supported
Failed Reason: The image '520713654638.dkr.ecr.eu-west-1.amazonaws.com/sagemaker-tensorflow:1.11-cpu-py3' does not exist.

I've been looking for docs for how this all works together but I've been struggling to find information. Any hints to point me in the right direction would really be appreciated!
The main info on the entry-point script I found was this example that used the functions along with a lot more and an example in this repo
But I haven't managed to find anything about how the byte strings or StreamingBody play a role in the serialisation process.

So what I'm wondering about is:

  • Reason for difference in deserialization client and server side
  • How to load the content of the StreamingBody (where issue is possibly Python 2 server communicating with Python 3 client?)
  • Why deploying my model with Python 3 fails
  • Where any docs that live that might help out with this space

Thanks!

@jesterhazy
Copy link
Contributor

@Robbie-Palmer,

There are a few things going on here.

  1. Our TF 1.11 containers do not support python3. That's why that is failing. If you need python3 support for a TF endpoint, you need to use our (different) TensorFlow Serving container. Code and model packaging for that work quite differently. See docs here and here.

  2. Yes, the server side and client side implementation are different. On the client side, our sdk (and your code) need to interact with the data returned by the AWSboto3 python sdk, while on the server side, your input_fn and output_fn need to work with inputs provided by the HTTP serving stack inside the container.

For the client side, you should be able to use the ones that are already in our SDK, but your code would need to look like this:

from sagemaker.tensorflow.model import TensorFlowModel
from sagemaker.predictor import RealTimePredictor, npy_serializer, numpy_deserializer

model = TensorFlowModel(model_data=f's3://{bucket}/model/{model_name}.tar.gz',
                        role=role,
                        entry_point='entry_point.py',
                        py_version='py3',
                        name=model_name)

model.deploy(initial_instance_count=1,
                         instance_type='ml.m4.xlarge',
                         endpoint_name=endpoint_name)

predictor = RealTimePredictor(endpoint_name, 
                              serializer=npy_serializer, 
                              deserializer=numpy_deserializer)

On the server side, your code might look like this:

import io
import numpy as np
def input_fn(data, content_type):
    return np.load(io.BytesIO(data), allow_pickle=True)

def output_fn(prediction, accepts):
    buffer = io.BytesIO()
    np.save(buffer, prediction.asnumpy())
    return buffer.getvalue()

@Robbie-Palmer
Copy link
Author

Hi @jesterhazy

Ah thank-you for the info!

For point 1:
When deploying using the sagemaker.tensorflow.model.TensorFlowModel class it gives the below warning that when creating an instance, you should specify 'py3'

The Python 2 tensorflow images will be soon deprecated and may not be supported for newer upcoming versions of the tensorflow images.
Please set the argument "py_version='py3'" to use the Python 3 tensorflow image.

Is the case really that this old way of deploying models is being deprecated and won't support Python 3?

For point 2:
Thank-you! This now works with the following on the server side given that the output_fn is receiving a tensorflow_serving.apis.predict_pb2.PredictResponse object:

def output_fn(prediction, accepts):'score'
    output = prediction.outputs['score']
    shape = [dim.size for dim in output.tensor_shape.dim]
    prediction_array = np.reshape(output.float_val, shape)
    buffer = BytesIO()
    np.save(buffer, prediction_array)
    return buffer.getvalue()

This has a huge performance benefit e.g. an example array of size [1, 280, 280, 3] took 4.6 secs for a response with JSON but with numpy serialization it takes only 0.4 seconds

Also I've been able to increase the request size from around [1, 320, 320, 3] up to just below [1, 420, 420, 3]
Though at this point I then receive an error about exceeding the allowed data size on TensorFlow Serving

ERROR in serving: <_Rendezvous of RPC that terminated with:
status = StatusCode.RESOURCE_EXHAUSTED
details = "Received message larger than max (4233682 vs. 4194304)"

I can see that you have already addressed this by increading the message size from 4MB to 2GB
Has this change been deployed yet?
If so do I just need to explicitly point to the image with the right version?

@chuyang-deng
Copy link
Contributor

Hi @Robbie-Palmer,

Regarding your questions:

  1. We are deprecating python2 because python2 will not be maintained after 01/01/2020, that's why we are planning to deprecate it.

  2. Yes. The gRPC message size limit change is released.

Thanks

@whatdhack
Copy link

What would the server side (e.g. entry_point='infefrence.py') code be for Tensorflow Serving Model ?

@laurenyu
Copy link
Contributor

@whatdhack you can find documentation about pre/post-processing with TFS at https://github.com/aws/sagemaker-tensorflow-serving-container/#prepost-processing

@whatdhack
Copy link

Not clear to me from the above documentation how I would add numpy_deserializer on the server side corresponding to the npy_serializer on the client side (e.g. the corresponding part on the Sagemaker server for the following code ) in the inference.py file which is supposedly part of the archive that the endpoint uses.

predictor = RealTimePredictor(endpoint_name, 
                              serializer=npy_serializer, 
                              deserializer=numpy_deserializer)

@laurenyu
Copy link
Contributor

here's an example of deserializing input from the docs:

def input_handler(data, context):
    """ Pre-process request input before it is sent to TensorFlow Serving REST API
    Args:
        data (obj): the request data, in format of dict or string
        context (Context): an object containing request and configuration details
    Returns:
        (dict): a JSON-serializable dict that contains request body and headers
    """
    if context.request_content_type == 'application/json':
        # pass through json (assumes it's correctly formed)
        d = data.read().decode('utf-8')
        return d if len(d) else ''

    if context.request_content_type == 'text/csv':
        # very simple csv handler
        return json.dumps({
            'instances': [float(x) for x in data.read().decode('utf-8').split(',')]
        })

    raise ValueError('{{"error": "unsupported content type {}"}}'.format(
        context.request_content_type or "unknown"))

for numpy, you'd want to check for "application/x-npy" for the content type and then use something like numpy.load to read the data

@whatdhack
Copy link

Thanks. An example for application/x-npy is what I am looking for.

@Robbie-Palmer
Copy link
Author

Hi @whatdhack for me I just needed to do
np.load(BytesIO(data), allow_pickle=True)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants