Skip to content

Elastic Inference: Internal Error for prediction #1370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
thomas-beznik opened this issue Mar 20, 2020 · 17 comments
Closed

Elastic Inference: Internal Error for prediction #1370

thomas-beznik opened this issue Mar 20, 2020 · 17 comments

Comments

@thomas-beznik
Copy link

Hello everyone!

Describe the bug
I have deployed a TorchScript model using Elastic Inference. I a now getting an error when running predictor.predict(data) (I am using the default predict_fn in my entry point script):

2020-03-20 14:10:08,136 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend worker thread exception. java.lang.IllegalArgumentException: reasonPhrase contains one of the following prohibited characters: \r\n: Last Error: EI Error Code: [1, 2, 1] EI Error Description: Internal error EI Request ID: PT-DD5B9348-6ECA-40C0-826F-088F6266D347 -- EI Accelerator ID: eia-d0e5536df7f34bbd98b5950fbeb6a431 EI Client Version: 1.6.2

I can give more information if needed.

Thanks for the help!
Thomas

System information

  • SageMaker Python SDK version: 1.51.3
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): PyTorch
  • Framework version: 1.3.1
  • Python version: 3.6.5
  • CPU or GPU: GPU
  • Custom Docker image (Y/N): N
@laurenyu
Copy link
Contributor

reasonPhrase contains one of the following prohibited characters: \r\n

usually when I see \r\n it means a Windows-style line ending. is reasonPhrase something defined in your code?

@ThomasBeznik
Copy link

No this isn't something I have defined in my code. For my entry point script, I used the default one defined here.

@laurenyu
Copy link
Contributor

What does your prediction data look like?

@thomas-beznik
Copy link
Author

thomas-beznik commented Mar 23, 2020

@laurenyu My input data is this: np.random.rand(1,1,32,32,32).astype("float16")

@dfan
Copy link
Contributor

dfan commented Mar 23, 2020

I notice that you're passing a FP16 input - the issue may potentially be with how your model was traced and saved. Can you describe where and how you saved the model? Like whether you converted the model to half precision first, then traced, or the other way around. And whether you did this on a GPU instance with the CUDA enabled framework, or with the Elastic Inference enabled framework.

Also this shouldn't be different, but what happens if you use torch.rand(1,1,32,32,32).half() as the input instead of a numpy array?

@dfan
Copy link
Contributor

dfan commented Mar 23, 2020

I am pretty sure your model wasn't converted to half precision properly. Found this error in our service logs:

Failed infer on model 122b1ea6-25bb-46f5-bf9c-62652a03ab8f_MODEL_PT_F7236B7B_FEB2_48B7_B323__26C4ED00_DE26_4CCA_AF9A_AEE9DC544289. Reason (std::exception): Input type (Variable[CUDAHalfType]) and weight type (Variable[CUDAFloatType]) should be the same

This means you passed a half precision input but your model weights are still full precision. So if you remove the float16 conversion and do inference with just torch.rand(1,1,32,32,32), it should work with your current FP32 model.

@thomas-beznik
Copy link
Author

thomas-beznik commented Mar 24, 2020

Hello @dfan,

Thank you for your help.

Indeed you are right, the error comes from the usage of FP16! I am actually not using half-precision but was just passing the input as FP16 in order to respect the 5GB limit of the endpoint. I am now transforming it back to FP32 in input_fn and it solves the problem!

I traced my model on a CPU instance; should I do it with the Elastic Inference enabled framework?

It would be nice to get these error outputs in order to simplify debugging; I would have easily located my mistake if I had seen this error message.

Inference now works, but there's still something weird: I seem to be getting a memory error (I guess, as I can only see "Internal Error") when using an input of size (1,1,96,96,96). But when I run it locally and track the CUDA memory usage (using this) this inference only uses at most 1.5 GB; very far from the 8 GB available in the "ml.eia2.xlarge" accelerator. What is the reason behind this? Could it be due to the way I trace my model?

Thank you!

@dfan
Copy link
Contributor

dfan commented Mar 24, 2020

In SageMaker hosting you don't get console outputs, everything is going to be logged in CloudWatch. Let me know if you could find the error that I mentioned in your logs (the log group should start with the name "/aws/sagemaker/Endpoints/..." if you didn't change the defaults).

If you are tracing your model and saving ahead of time, you don't have to do it with the Elastic Inference enabled framework. You can directly torch.jit.load it. You only need to use the Elastic Inference enabled framework / optimized_execution() context to trace if you are tracing your model and doing inference immediately after without serializing the model first.

Can you give me a paste of your error and also which region? I need the accelerator ID and time of the error. If the issue is indeed memory related, I suspect the issue is with accelerator RAM not GPU memory

@dfan
Copy link
Contributor

dfan commented Mar 24, 2020

Also you meant 5 MB right? Not 5 GB. I believe the payload limit is 5 MB. You could try compressing the input rather than converting to half precision and then back to FP32

@thomas-beznik
Copy link
Author

I can only see this in CloudWatch:

java.lang.IllegalArgumentException: reasonPhrase contains one of the following prohibited characters: \r\n: Last Error: EI Error Code: [1, 2, 1] EI Error Description: Internal error EI Request ID: PT-4DA5D794-F920-4281-B8A9-BD047239281F -- EI Accelerator ID: eia-5a652e78eedc4465b680c94031dcc78e EI Client Version: 1.6.2

I am not able to see the same error message as you. Or maybe I am not looking at the right place?

Okay, that's indeed what I currently use to load my model!

My region is eu-west-1, the accelerator ID is in the error above and the time of the error is 21:30:30. How much RAM does this accelerator have?

@thomas-beznik
Copy link
Author

Yes, I meant 5 MB, indeed I could compress it!

@dfan
Copy link
Contributor

dfan commented Mar 25, 2020

You're right. We currently classify most errors for PyTorch internally as a 5xx error and do not propagate them to the client at this time. We have work planned to clean up our error handling, so that's why you aren't seeing many outputs in CloudWatch. Very sorry that this is slowing down your ability to get things setup.

The error is not related to memory it turns out. "Given groups=1, weight of size 64 21 3 3, expected input[1, 11, 129, 129] to have 21 channels, but got 11 channels instead". It sounds like either

  1. You are passing an input of the wrong dimension.
  2. Your model isn't constructed properly (PyTorch doesn't do shape validation so you won't get errors until inference time).
  3. Or your model code has different behavior for different tensor sizes, and you traced the model with a different sized input from what you're currently passing.

Is your model fully debugged? Let's eliminate cases 1 and 2 first. If you're not sure, try running inference on a standalone EC2 instance so that you get full debugging output. Elastic Inference is meant to serve models that are production ready

@thomas-beznik
Copy link
Author

Good to know that you are working on better error handling, I believe that it will be crucial to enable developers to use this service.

Sorry, this is not the error I wanted to ask you. This is just because I was playing with the input of the model and changed a parameter I shouldn't have. The actual error that I am stuck on is the following:

14:54:31 2020-03-25 14:54:30,444 [ERROR] epollEventLoopGroup-4-1 com.amazonaws.ml.mms.wlm.WorkerThread - Unknown exception 14:54:31 io.netty.handler.codec.CorruptedFrameException: Message size exceed limit: 7078016 14:54:31 #011at com.amazonaws.ml.mms.util.codec.CodecUtils.readLength(CodecUtils.java:36) 14:54:31 #011at com.amazonaws.ml.mms.util.codec.ModelResponseDecoder.decode(ModelResponseDecoder.java:84) 14:54:31 #011at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) 14:54:31 #011at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) 14:54:31 #011at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:392) 14:54:31 #011at io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:359) 14:54:31 #011at io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:342) 14:54:31 #011at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) 14:54:31 #011at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) 14:54:31 #011at io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224) 14:54:31 #011at io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1429) 14:54:31 #011at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245) 14:54:31 #011at io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231) 14:54:31 #011at io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:947) 14:54:31 #011at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:822) 14:54:31 #011at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) 14:54:31 #011at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) 14:54:31 #011at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:309) 14:54:31 #011at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) 14:54:31 #011at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) 14:54:31 #011at java.lang.Thread.run(Thread.java:748)

This error occurs after the model prediction on an input of size (1,1,96,96,96).

EI Accelerator ID: eia-098b6ee84f3b44809b0ead011e21689c.

Thanks!

@dfan
Copy link
Contributor

dfan commented Mar 26, 2020

I checked the server logs and your inference ran without error. So somehow your inference output isn't making it back to the client. Can you describe what your model output looks like? Is it a tensor and if so what are the dimensions? If the output is not a tensor then you may have to implement your own serialization logic.

Also to clarify: are you passing a numpy array or torch tensor as the input for inference?

@thomas-beznik
Copy link
Author

It is a tensor, with the same dimensions as the input data i.e. in this case (1,1,96,96,96). I am passing a numpy array to the endpoint, which transforms it into a tensor in the input_fn method.

Some clarifications: I am able to receive an output when I use an input of size (1,1,32,32,32), but it fails with the above error when I use a too big input size such as (1,1,96,96,96). The inference indeed works, the error arises afterwards, I believe when returning the output in the predict_fn method.

@dfan
Copy link
Contributor

dfan commented Mar 26, 2020

You are running into the payload limit of 5 MB since your total payload (from the error message) is about 7 MB. A 96x96x96 tensor (assuming float32) is about 3.5 MB. Someone from SM hosting team will comment about potential workarounds. If this is the only remaining issue, then it seems Elastic Inference is working for you :)

https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_runtime_InvokeEndpoint.html#API_runtime_InvokeEndpoint_RequestSyntax

@thomas-beznik
Copy link
Author

Great! Thank you so much for your help :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants