Skip to content

Deploying Pytorch models with elastic inference #1360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
thomas-beznik opened this issue Mar 16, 2020 · 9 comments
Closed

Deploying Pytorch models with elastic inference #1360

thomas-beznik opened this issue Mar 16, 2020 · 9 comments

Comments

@thomas-beznik
Copy link

thomas-beznik commented Mar 16, 2020

Hello,

I am trying to deploy a Pytorch model on sagemaker using Elastic Inference. I have trouble finding the information I want in the documentation.

In this page https://sagemaker.readthedocs.io/en/stable/using_pytorch.html#deploy-pytorch-models, it is said that "if you are using PyTorch Elastic Inference, you do not have to provide a model_fn since the PyTorch serving container has a default one for you". Do we have to use this default model_fn, or can we use our own? Do we have to use a TorchScript model or not?

It would also be great to have a full example of how to deploy a Pytorch model trained outside of AWS.

Thanks!

@chuyang-deng
Copy link
Contributor

Hi, @thomas-beznik, thanks for using SageMaker. We are currently working on the PyTorch EI documentations and more detailed examples are coming soon!

At the same time, to answer your question:

  1. you do not "have to" use the default functions, you can use your own inference script
  2. in your custom inference script, to trigger accelerator, you have to use a TorchScript model. You will need to use torch.jit.save to save your model, instead of saving it as a state dictionary; also in the predict_fn of your implementation, please use torch.jit.optimized_execution to load your output.

@thomas-beznik
Copy link
Author

Hello,

Thank you for your help! I am now running into another issue. I have already trained my model outside of AWS and I have a TorchScript version of it. I would like to deploy it to an instance with elastic inference. I am thus using a PyTorchModel. When I deploy it I get the following error:

ValueError: pytorch-serving is not supported with Amazon Elastic Inference. Currently only Python-based TensorFlow and MXNet are supported.

But in the tutorial, they are able to deploy with elastic inference. They do it using the PyTorch class. The problem is that my model is already trained and I thus can't use this class. My questions are the following:

  • Is there a way to deploy a PyTorchModel with the accelerator?
  • Or is there a way to transform my model from PyTorchModel to PyTorch?

Thank you for the help!

@chuyang-deng
Copy link
Contributor

Hi @thomas-beznik, are you using the latest version of Python SDK? Our current error message looks like this:

"{} is not supported with Amazon Elastic Inference. Currently only "

Please make sure you are using the version 1.51.0 and above to use PyTorch EIA.

@thomas-beznik
Copy link
Author

thomas-beznik commented Mar 19, 2020

Thank you @ChuyangDeng for the help so far, I was now able to deploy my model!

But the journey is not yet over... I am now running into problems when trying to perform inference on the deployed model: when running the command predictor.predict(input) (where input is a numpy array) I get the following error:

ConnectionClosedError: Connection was closed before we received a valid response from endpoint URL: "https://runtime.sagemaker.eu-west-1.amazonaws.com/endpoints/pytorch-inference-.../invocations

But when looking in the logs of my endpoint I can't see any error there...

  • Do you know what could be the cause of this error?
  • Are there tutorials, or do you have any advice to debug this sort of situation and to get a better view of what is happening inside of the endpoint? I have used logging.info inside of my entry-point script of the endpoint, but I can't see anything written to the log file.

Thank you very much for your help!

Best,
Thomas

@thomas-beznik
Copy link
Author

thomas-beznik commented Mar 19, 2020

Ah it seems to be a size problem: when using a numpy array of size (1, 1, 96, 96, 96) I get the above error, but when I use an array of size (1, 1, 10, 10, 10) it doesn't give me the error anymore, but it gives me another error:

java.lang.IllegalArgumentException: reasonPhrase contains one of the following prohibited characters: \r\n: Cannot initialize CUDA without ATen_cuda library. PyTorch splits its backend into two shared libraries: a CPU library and a CUDA library; this error has occurred because you are trying to use some CUDA functionality, but the CUDA library has not been loaded by the dynamic linker for some reason. The CUDA library MUST be loaded, EVEN IF you don't directly use any symbols from the CUDA library! One common culprit is a lack of -Wl,--no-as-needed in your link arguments; many dynamic linkers will delete dynamic library dependencies if you don't depend on any of their symbols. You can check if this has occurred by using ldd on your binary to see if there is a dependency on *_cuda.so library. (initCUDA at /opt/conda/conda-bld/pytorch_1573049306851/work/aten/src/ATen/detail/CUDAHooksInterface.h:63)

I believe that this error occurs in the model_fn method when creating the model with torch.jit.load.

Thanks!
Thomas

@laurenyu
Copy link
Contributor

CUDA requires a GPU, but EI works only for CPU. I've forwarded this onto the team that owns PyTorch + EI to see if they have any insight. thanks for your patience!

@dfan
Copy link
Contributor

dfan commented Mar 20, 2020

It looks like your model was saved while it was sent to CUDA device, so you'll need to provide an implementation of model_fn that loads to CPU. torch.jit.load(model, map_location=torch.device('cpu')). This may be something we should clarify in the docs, I'll consult with the team.

The model is first loaded on the host instance which has a CPU-only version of the framework. This model is then sent over the network to the server, which has GPU context enabled and we move your model tensors to CUDA at that time. So your model inference happens with CUDA but your model needs to be loaded initially to CPU.

@ThomasBeznik
Copy link

Yes indeed this solved it! But I am now getting a new error when running the prediction . Any help on that would be great!

@laurenyu
Copy link
Contributor

glad to hear we're making progress. going to close this issue and continue the conversation on #1370

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants