Elastic Inference: Internal Error for prediction #1370

thomas-beznik · 2020-03-20T14:46:41Z

Hello everyone!

Describe the bug
I have deployed a TorchScript model using Elastic Inference. I a now getting an error when running predictor.predict(data) (I am using the default predict_fn in my entry point script):

2020-03-20 14:10:08,136 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend worker thread exception. java.lang.IllegalArgumentException: reasonPhrase contains one of the following prohibited characters: \r\n: Last Error: EI Error Code: [1, 2, 1] EI Error Description: Internal error EI Request ID: PT-DD5B9348-6ECA-40C0-826F-088F6266D347 -- EI Accelerator ID: eia-d0e5536df7f34bbd98b5950fbeb6a431 EI Client Version: 1.6.2

I can give more information if needed.

Thanks for the help!
Thomas

System information

SageMaker Python SDK version: 1.51.3
Framework name (eg. PyTorch) or algorithm (eg. KMeans): PyTorch
Framework version: 1.3.1
Python version: 3.6.5
CPU or GPU: GPU
Custom Docker image (Y/N): N

The text was updated successfully, but these errors were encountered:

laurenyu · 2020-03-20T18:14:36Z

reasonPhrase contains one of the following prohibited characters: \r\n

usually when I see \r\n it means a Windows-style line ending. is reasonPhrase something defined in your code?

ThomasBeznik · 2020-03-23T08:36:54Z

No this isn't something I have defined in my code. For my entry point script, I used the default one defined here.

laurenyu · 2020-03-23T16:47:53Z

What does your prediction data look like?

thomas-beznik · 2020-03-23T17:16:27Z

@laurenyu My input data is this: np.random.rand(1,1,32,32,32).astype("float16")

dfan · 2020-03-23T20:40:55Z

I notice that you're passing a FP16 input - the issue may potentially be with how your model was traced and saved. Can you describe where and how you saved the model? Like whether you converted the model to half precision first, then traced, or the other way around. And whether you did this on a GPU instance with the CUDA enabled framework, or with the Elastic Inference enabled framework.

Also this shouldn't be different, but what happens if you use torch.rand(1,1,32,32,32).half() as the input instead of a numpy array?

dfan · 2020-03-23T21:31:27Z

I am pretty sure your model wasn't converted to half precision properly. Found this error in our service logs:

Failed infer on model 122b1ea6-25bb-46f5-bf9c-62652a03ab8f_MODEL_PT_F7236B7B_FEB2_48B7_B323__26C4ED00_DE26_4CCA_AF9A_AEE9DC544289. Reason (std::exception): Input type (Variable[CUDAHalfType]) and weight type (Variable[CUDAFloatType]) should be the same

This means you passed a half precision input but your model weights are still full precision. So if you remove the float16 conversion and do inference with just torch.rand(1,1,32,32,32), it should work with your current FP32 model.

thomas-beznik · 2020-03-24T13:43:41Z

Hello @dfan,

Thank you for your help.

Indeed you are right, the error comes from the usage of FP16! I am actually not using half-precision but was just passing the input as FP16 in order to respect the 5GB limit of the endpoint. I am now transforming it back to FP32 in input_fn and it solves the problem!

I traced my model on a CPU instance; should I do it with the Elastic Inference enabled framework?

It would be nice to get these error outputs in order to simplify debugging; I would have easily located my mistake if I had seen this error message.

Inference now works, but there's still something weird: I seem to be getting a memory error (I guess, as I can only see "Internal Error") when using an input of size (1,1,96,96,96). But when I run it locally and track the CUDA memory usage (using this) this inference only uses at most 1.5 GB; very far from the 8 GB available in the "ml.eia2.xlarge" accelerator. What is the reason behind this? Could it be due to the way I trace my model?

Thank you!

dfan · 2020-03-24T19:08:01Z

In SageMaker hosting you don't get console outputs, everything is going to be logged in CloudWatch. Let me know if you could find the error that I mentioned in your logs (the log group should start with the name "/aws/sagemaker/Endpoints/..." if you didn't change the defaults).

If you are tracing your model and saving ahead of time, you don't have to do it with the Elastic Inference enabled framework. You can directly torch.jit.load it. You only need to use the Elastic Inference enabled framework / optimized_execution() context to trace if you are tracing your model and doing inference immediately after without serializing the model first.

Can you give me a paste of your error and also which region? I need the accelerator ID and time of the error. If the issue is indeed memory related, I suspect the issue is with accelerator RAM not GPU memory

dfan · 2020-03-24T20:06:15Z

Also you meant 5 MB right? Not 5 GB. I believe the payload limit is 5 MB. You could try compressing the input rather than converting to half precision and then back to FP32

thomas-beznik · 2020-03-24T21:37:30Z

I can only see this in CloudWatch:

java.lang.IllegalArgumentException: reasonPhrase contains one of the following prohibited characters: \r\n: Last Error: EI Error Code: [1, 2, 1] EI Error Description: Internal error EI Request ID: PT-4DA5D794-F920-4281-B8A9-BD047239281F -- EI Accelerator ID: eia-5a652e78eedc4465b680c94031dcc78e EI Client Version: 1.6.2

I am not able to see the same error message as you. Or maybe I am not looking at the right place?

Okay, that's indeed what I currently use to load my model!

My region is eu-west-1, the accelerator ID is in the error above and the time of the error is 21:30:30. How much RAM does this accelerator have?

thomas-beznik · 2020-03-24T21:38:23Z

Yes, I meant 5 MB, indeed I could compress it!

dfan · 2020-03-25T01:45:05Z

You're right. We currently classify most errors for PyTorch internally as a 5xx error and do not propagate them to the client at this time. We have work planned to clean up our error handling, so that's why you aren't seeing many outputs in CloudWatch. Very sorry that this is slowing down your ability to get things setup.

The error is not related to memory it turns out. "Given groups=1, weight of size 64 21 3 3, expected input[1, 11, 129, 129] to have 21 channels, but got 11 channels instead". It sounds like either

You are passing an input of the wrong dimension.
Your model isn't constructed properly (PyTorch doesn't do shape validation so you won't get errors until inference time).
Or your model code has different behavior for different tensor sizes, and you traced the model with a different sized input from what you're currently passing.

Is your model fully debugged? Let's eliminate cases 1 and 2 first. If you're not sure, try running inference on a standalone EC2 instance so that you get full debugging output. Elastic Inference is meant to serve models that are production ready

thomas-beznik · 2020-03-25T15:19:32Z

Good to know that you are working on better error handling, I believe that it will be crucial to enable developers to use this service.

Sorry, this is not the error I wanted to ask you. This is just because I was playing with the input of the model and changed a parameter I shouldn't have. The actual error that I am stuck on is the following:

14:54:31 2020-03-25 14:54:30,444 [ERROR] epollEventLoopGroup-4-1 com.amazonaws.ml.mms.wlm.WorkerThread - Unknown exception 14:54:31 io.netty.handler.codec.CorruptedFrameException: Message size exceed limit: 7078016 14:54:31 #011at com.amazonaws.ml.mms.util.codec.CodecUtils.readLength(CodecUtils.java:36) 14:54:31 #011at com.amazonaws.ml.mms.util.codec.ModelResponseDecoder.decode(ModelResponseDecoder.java:84) 14:54:31 #011at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) 14:54:31 #011at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) 14:54:31 #011at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:392) 14:54:31 #011at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:359) 14:54:31 #011at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:342) 14:54:31 #011at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) 14:54:31 #011at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) 14:54:31 #011at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224) 14:54:31 #011at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1429) 14:54:31 #011at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) 14:54:31 #011at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) 14:54:31 #011at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:947) 14:54:31 #011at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:822) 14:54:31 #011at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) 14:54:31 #011at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) 14:54:31 #011at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:309) 14:54:31 #011at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) 14:54:31 #011at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) 14:54:31 #011at java.lang.Thread.run(Thread.java:748)

This error occurs after the model prediction on an input of size (1,1,96,96,96).

EI Accelerator ID: eia-098b6ee84f3b44809b0ead011e21689c.

Thanks!

dfan · 2020-03-26T09:35:45Z

I checked the server logs and your inference ran without error. So somehow your inference output isn't making it back to the client. Can you describe what your model output looks like? Is it a tensor and if so what are the dimensions? If the output is not a tensor then you may have to implement your own serialization logic.

Also to clarify: are you passing a numpy array or torch tensor as the input for inference?

thomas-beznik · 2020-03-26T11:03:08Z

It is a tensor, with the same dimensions as the input data i.e. in this case (1,1,96,96,96). I am passing a numpy array to the endpoint, which transforms it into a tensor in the input_fn method.

Some clarifications: I am able to receive an output when I use an input of size (1,1,32,32,32), but it fails with the above error when I use a too big input size such as (1,1,96,96,96). The inference indeed works, the error arises afterwards, I believe when returning the output in the predict_fn method.

dfan · 2020-03-26T20:58:18Z

You are running into the payload limit of 5 MB since your total payload (from the error message) is about 7 MB. A 96x96x96 tensor (assuming float32) is about 3.5 MB. Someone from SM hosting team will comment about potential workarounds. If this is the only remaining issue, then it seems Elastic Inference is working for you :)

https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpoint.html#API_runtime_InvokeEndpoint_RequestSyntax

thomas-beznik · 2020-03-27T08:20:35Z

Great! Thank you so much for your help :)

laurenyu added the type: question label Mar 20, 2020

ThomasBeznik mentioned this issue Mar 23, 2020

Deploying Pytorch models with elastic inference #1360

Closed

thomas-beznik closed this as completed Mar 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elastic Inference: Internal Error for prediction #1370

Elastic Inference: Internal Error for prediction #1370

thomas-beznik commented Mar 20, 2020

laurenyu commented Mar 20, 2020

ThomasBeznik commented Mar 23, 2020

laurenyu commented Mar 23, 2020

thomas-beznik commented Mar 23, 2020 •

edited

Loading

dfan commented Mar 23, 2020 •

edited

Loading

dfan commented Mar 23, 2020 •

edited

Loading

thomas-beznik commented Mar 24, 2020 •

edited

Loading

dfan commented Mar 24, 2020 •

edited

Loading

dfan commented Mar 24, 2020

thomas-beznik commented Mar 24, 2020

thomas-beznik commented Mar 24, 2020

dfan commented Mar 25, 2020 •

edited

Loading

thomas-beznik commented Mar 25, 2020

dfan commented Mar 26, 2020

thomas-beznik commented Mar 26, 2020

dfan commented Mar 26, 2020

thomas-beznik commented Mar 27, 2020

Elastic Inference: Internal Error for prediction #1370

Elastic Inference: Internal Error for prediction #1370

Comments

thomas-beznik commented Mar 20, 2020

laurenyu commented Mar 20, 2020

ThomasBeznik commented Mar 23, 2020

laurenyu commented Mar 23, 2020

thomas-beznik commented Mar 23, 2020 • edited Loading

dfan commented Mar 23, 2020 • edited Loading

dfan commented Mar 23, 2020 • edited Loading

thomas-beznik commented Mar 24, 2020 • edited Loading

dfan commented Mar 24, 2020 • edited Loading

dfan commented Mar 24, 2020

thomas-beznik commented Mar 24, 2020

thomas-beznik commented Mar 24, 2020

dfan commented Mar 25, 2020 • edited Loading

thomas-beznik commented Mar 25, 2020

dfan commented Mar 26, 2020

thomas-beznik commented Mar 26, 2020

dfan commented Mar 26, 2020

thomas-beznik commented Mar 27, 2020

thomas-beznik commented Mar 23, 2020 •

edited

Loading

dfan commented Mar 23, 2020 •

edited

Loading

dfan commented Mar 23, 2020 •

edited

Loading

thomas-beznik commented Mar 24, 2020 •

edited

Loading

dfan commented Mar 24, 2020 •

edited

Loading

dfan commented Mar 25, 2020 •

edited

Loading