-
Notifications
You must be signed in to change notification settings - Fork 91
Issue with torchvision::nms using custom Pytorch and TorchVision #181
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I can confirm after building the image myself I still have the error. |
And by process of elimination, choosing one of the aws versions of torch or torchvision with one of the remaining from pypi, the issue results when you have the aws torch and pypi torchvision. So my guess it's something related to the aws torch build. |
We currently have a proposed fix for this, and we will have this fixed in the next release of PyTorch containers |
I tried the lastest docker image (hash
|
The last release was a patch release for a different issue, the fix for this issue has not yet been merged in. |
Got it. Thanks for the reply. Indeed that patch fixed another issue I had so thankful for that :). |
@harshp8l Is there an eta for when this will get pushed. I'm stuck without the ability to train our models and this is crippling me and my company |
@mmhealey1 if you need an immediate solution, created a custom
|
Quick solution to help mitigate this (from within a container): Let me know if you have any success with this |
We are prioritizing this fix for our upcoming release for Pytorch 1.5 |
@harshp8l Has this been fixed? I'm getting this issue as well |
This fix should be addressed now (which version of torch are you using? - this was addressed in later versions) |
@harshp8l I am using version 1.4.0. I tried setting the framework version to 1.5.1, and I get I assumed there was a backwards compatibility issue with the most recent version (1.5.1) and the model that I'm using, which was initially built using Pytorch 1.1.0. |
PyTorch has noted some backwards compatibility issues with 1.5.1, are you able to use 1.5.0? |
I will run with 1.5.0 and see if it works. |
1.5.0 has the same issue, but for some reason, when I use the 1.4.0 Sagemaker Pytorch docker container and then re-install Pytorch 1.1.0 it works fine. It's hacky and unclean but it's the only thing that I've been able to get to work. |
@Vedaad-Shakib it has something to do with their custom build of pytorch/torch vision. So reinstalling just replaces them with the general distribution. I assume they have some custom optimizations in their package. |
What exactly is the issue you are running into here? Are you able to provide steps to reproduce? If it is the same error you mentioned above, can you provide the stack trace and run with: At an initial glance, this seems to be an issue with using inplace operations ... I am noticing the torchvision op for nms being loaded on my end:
|
I can confirm that |
This is issue is extensively discussed and summarized in this pytorch issue. For future references, here's the quote:
|
I've been trying to run some training jobs using the torch
pytorch-training:1.4.0-cpu-py3
image and have been running into thisRuntimeError: No such operator torchvision::nms
error. From what I can tell it works if you uninstall the customtorch
andtorchvision
packages and install the ones from pypi. Comparing the two it looks liketorch
is not loading thetorchvision
library.https://github.com/aws/sagemaker-pytorch-container/blob/e87ca0714862ccdba4b380944db3d828cb8c7871/docker/1.4.0/py3/Dockerfile.cpu#L101
After pip uninstall and install
I've been trying to manually build that image locally and having some issues that are related to #141 but that is another issue I'm working through.
The text was updated successfully, but these errors were encountered: