-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Elastic Inference: Internal Error for prediction #1370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
usually when I see |
No this isn't something I have defined in my code. For my entry point script, I used the default one defined here. |
What does your prediction data look like? |
@laurenyu My input data is this: |
I notice that you're passing a FP16 input - the issue may potentially be with how your model was traced and saved. Can you describe where and how you saved the model? Like whether you converted the model to half precision first, then traced, or the other way around. And whether you did this on a GPU instance with the CUDA enabled framework, or with the Elastic Inference enabled framework. Also this shouldn't be different, but what happens if you use |
I am pretty sure your model wasn't converted to half precision properly. Found this error in our service logs:
This means you passed a half precision input but your model weights are still full precision. So if you remove the float16 conversion and do inference with just |
Hello @dfan, Thank you for your help. Indeed you are right, the error comes from the usage of FP16! I am actually not using half-precision but was just passing the input as FP16 in order to respect the 5GB limit of the endpoint. I am now transforming it back to FP32 in I traced my model on a CPU instance; should I do it with the Elastic Inference enabled framework? It would be nice to get these error outputs in order to simplify debugging; I would have easily located my mistake if I had seen this error message. Inference now works, but there's still something weird: I seem to be getting a memory error (I guess, as I can only see "Internal Error") when using an input of size (1,1,96,96,96). But when I run it locally and track the CUDA memory usage (using this) this inference only uses at most 1.5 GB; very far from the 8 GB available in the "ml.eia2.xlarge" accelerator. What is the reason behind this? Could it be due to the way I trace my model? Thank you! |
In SageMaker hosting you don't get console outputs, everything is going to be logged in CloudWatch. Let me know if you could find the error that I mentioned in your logs (the log group should start with the name "/aws/sagemaker/Endpoints/..." if you didn't change the defaults). If you are tracing your model and saving ahead of time, you don't have to do it with the Elastic Inference enabled framework. You can directly Can you give me a paste of your error and also which region? I need the accelerator ID and time of the error. If the issue is indeed memory related, I suspect the issue is with accelerator RAM not GPU memory |
Also you meant 5 MB right? Not 5 GB. I believe the payload limit is 5 MB. You could try compressing the input rather than converting to half precision and then back to FP32 |
I can only see this in CloudWatch:
I am not able to see the same error message as you. Or maybe I am not looking at the right place? Okay, that's indeed what I currently use to load my model! My region is eu-west-1, the accelerator ID is in the error above and the time of the error is 21:30:30. How much RAM does this accelerator have? |
Yes, I meant 5 MB, indeed I could compress it! |
You're right. We currently classify most errors for PyTorch internally as a 5xx error and do not propagate them to the client at this time. We have work planned to clean up our error handling, so that's why you aren't seeing many outputs in CloudWatch. Very sorry that this is slowing down your ability to get things setup. The error is not related to memory it turns out. "Given groups=1, weight of size 64 21 3 3, expected input[1, 11, 129, 129] to have 21 channels, but got 11 channels instead". It sounds like either
Is your model fully debugged? Let's eliminate cases 1 and 2 first. If you're not sure, try running inference on a standalone EC2 instance so that you get full debugging output. Elastic Inference is meant to serve models that are production ready |
Good to know that you are working on better error handling, I believe that it will be crucial to enable developers to use this service. Sorry, this is not the error I wanted to ask you. This is just because I was playing with the input of the model and changed a parameter I shouldn't have. The actual error that I am stuck on is the following:
This error occurs after the model prediction on an input of size (1,1,96,96,96). EI Accelerator ID: eia-098b6ee84f3b44809b0ead011e21689c. Thanks! |
I checked the server logs and your inference ran without error. So somehow your inference output isn't making it back to the client. Can you describe what your model output looks like? Is it a tensor and if so what are the dimensions? If the output is not a tensor then you may have to implement your own serialization logic. Also to clarify: are you passing a numpy array or torch tensor as the input for inference? |
It is a tensor, with the same dimensions as the input data i.e. in this case (1,1,96,96,96). I am passing a numpy array to the endpoint, which transforms it into a tensor in the Some clarifications: I am able to receive an output when I use an input of size (1,1,32,32,32), but it fails with the above error when I use a too big input size such as (1,1,96,96,96). The inference indeed works, the error arises afterwards, I believe when returning the output in the |
You are running into the payload limit of 5 MB since your total payload (from the error message) is about 7 MB. A 96x96x96 tensor (assuming float32) is about 3.5 MB. Someone from SM hosting team will comment about potential workarounds. If this is the only remaining issue, then it seems Elastic Inference is working for you :) |
Great! Thank you so much for your help :) |
Hello everyone!
Describe the bug
I have deployed a TorchScript model using Elastic Inference. I a now getting an error when running
predictor.predict(data)
(I am using the default predict_fn in my entry point script):2020-03-20 14:10:08,136 [WARN ] W-9000-model com.amazonaws.ml.mms.wlm.WorkerThread - Backend worker thread exception. java.lang.IllegalArgumentException: reasonPhrase contains one of the following prohibited characters: \r\n: Last Error: EI Error Code: [1, 2, 1] EI Error Description: Internal error EI Request ID: PT-DD5B9348-6ECA-40C0-826F-088F6266D347 -- EI Accelerator ID: eia-d0e5536df7f34bbd98b5950fbeb6a431 EI Client Version: 1.6.2
I can give more information if needed.
Thanks for the help!
Thomas
System information
The text was updated successfully, but these errors were encountered: