-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Image Uri generation for Hugging Face is broken #2700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@philschmid Thanks for addressing the issue. I am working on it. |
I tested @ahsan-z-khan branch with pip install git+https://github.com/ahsan-z-khan/sagemaker-python-sdk.git@HF-image-uri --upgrade it used the correct image. |
Fix release with version 2.66.1 |
I trained a network (basic text-classifier distillbert) on my own laptop:
I had to downgrade to these versions since this is what JNB outputs in the console (max versions displayed below).
Error (seems to reproduce again):
@philschmid @ahsan-z-khan do you have any idea how to tackle this? Thank you in advance. UPDATE:
However I find it quite cumbersome to contend with the versions of the libraries for the docker images, since the error shown doesn't point towards a suitable combination of TF + PT + PY but rather the latest available in each category. Is there any particular "table" in which combinations of those libraries are certain to work? |
Since I found this issue when Googling - I'll leave the link to all of the deep learning container images here. You can find the correct combinations of Python, transformers, and Pytorch versions there. |
I'm having a similar issue, I was trying to use settings from the original documentation https://docs.aws.amazon.com/sagemaker/latest/dg-ecr-paths/ecr-us-east-1.html but even the default example from this page fails: ![]() It seems to me like the uri system for huggingface is broken or at least the documentation is outdated |
I also ran into this issue today. The solution is to check for the correct image uri on their official https://github.com/aws/deep-learning-containers/blob/master/available_images.md#huggingface-inference-containers page. AWS image uri list is outdated. |
Uh oh!
There was an error while loading. Please reload this page.
Describe the bug
i am getting an error saying
UnexpectedStatusException: Error hosting endpoint huggingface-pytorch-inference-2021-10-13-08-17-14-036: Failed. Reason: The image '763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.8.1-transformers4.10.2-gpu-py36-cu110-ubuntu18.04' does not exist..
when using the following code
Looking at the release here: https://github.com/aws/deep-learning-containers/releases/tag/v1.0-hf-4.10.2-pt-1.8.1-py36
I can see that the difference is cu110 to cu111
I looked at the image_uri_config and it correctly contains cu111
sagemaker-python-sdk/src/sagemaker/image_uri_config/huggingface.json
Line 569 in 82f1ba7
It is the same for pytorch 1.9 and transformers 4.11. But there is also the ubuntu version wrong.
When using pytorch 1.9 and transformers 4.11 the sdk generates the following tag
1.9-transformers4.11-gpu-py38-cu110-ubuntu18.04
but the correct one must include
cu111-ubuntu20.04
sagemaker-python-sdk/src/sagemaker/image_uri_config/huggingface.json
Line 775 in 82f1ba7
System information
A description of your system. Please provide:
cc @ahsan-z-khan
The text was updated successfully, but these errors were encountered: