Skip to content

Specifying image_uri in PyTorchModel gives TypeError when running deploy #2202

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lc-billyfung opened this issue Mar 10, 2021 · 13 comments
Closed
Assignees

Comments

@lc-billyfung
Copy link

Describe the bug
When creating a PyTorchModel and deploying to endpoint, using a specified image_uri, the model object is has attribute self.framework_version=None. In the check for _is_mms_version this will cause an error because of running a regex search with an input of type None instead of string or byte.

To reproduce

model = PyTorchModel(model_data=model_artifact,
                   name=name_from_base('model'),
                   role=role, 
                   entry_point="torchserve-predictor.py",
                   image_uri="763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-inference:1.7.1-cpu-py36-ubuntu18.04",
                   )

predictor = model.deploy(initial_instance_count=1, instance_type='ml.m5.xlarge', endpoint_name=endpoint_name)

Expected behavior
I expect the behavior to be the same as when providing framework_version and py_version into the creation of a PyTorchModel

Screenshots or logs

~/.pyenv/versions/lib/python3.6/site-packages/sagemaker/model.py in deploy(self, initial_instance_count, instance_type, serializer, deserializer, accelerator_type, endpoint_name, tags, kms_key, wait, data_capture_config, **kwargs)
    740                 self._base_name = "-".join((self._base_name, compiled_model_suffix))
    741 
--> 742         self._create_sagemaker_model(instance_type, accelerator_type, tags)
    743         production_variant = sagemaker.production_variant(
    744             self.name, instance_type, initial_instance_count, accelerator_type=accelerator_type

~/.pyenv/versions/lib/python3.6/site-packages/sagemaker/model.py in _create_sagemaker_model(self, instance_type, accelerator_type, tags)
    306                 /api/latest/reference/services/sagemaker.html#SageMaker.Client.add_tags
    307         """
--> 308         container_def = self.prepare_container_def(instance_type, accelerator_type=accelerator_type)
    309 
    310         self._ensure_base_name_if_needed(container_def["Image"])

~/.pyenv/versions/lib/python3.6/site-packages/sagemaker/pytorch/model.py in prepare_container_def(self, instance_type, accelerator_type)
    237 
    238         deploy_key_prefix = model_code_key_prefix(self.key_prefix, self.name, deploy_image)
--> 239         self._upload_code(deploy_key_prefix, repack=self._is_mms_version())
    240         deploy_env = dict(self.env)
    241         deploy_env.update(self._framework_env_vars())

~/.pyenv/versions/lib/python3.6/site-packages/sagemaker/pytorch/model.py in _is_mms_version(self)
    282         """
    283         lowest_mms_version = packaging.version.Version(self._LOWEST_MMS_VERSION)
--> 284         framework_version = packaging.version.Version(self.framework_version)
    285         return framework_version >= lowest_mms_version

~/.pyenv/versions/lib/python3.6/site-packages/packaging/version.py in __init__(self, version)
    294 
    295         # Validate the version and parse it into pieces
--> 296         match = self._regex.search(version)
    297         if not match:
    298             raise InvalidVersion("Invalid version: '{0}'".format(version))

TypeError: expected string or bytes-like object

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.29.1
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): Pytorch
  • Framework version: 1.7.1
  • Python version: 3.6.12
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): N

Thanks

@purplexed
Copy link

I was able to replicate the bug with the following system information:

SageMaker Python SDK version: 2.41.0
Framework name (eg. PyTorch) or algorithm (eg. KMeans): Pytorch
Framework version: 1.7.1
Python version: 3.6.12
CPU or GPU: CPU
Custom Docker image (Y/N): N

Also reported it to AWS support on the 20th of May.

@johann-petrak
Copy link

Affects me as well, workaround seems to be to just provide a dummy version, but an annoying bug all the same.

@zorrofox
Copy link

Same to me!

@oborchers
Copy link

Same for me. For the huggingface predictor it actually works, but it doesn't use the image I built but rather the default one...

@oborchers
Copy link

Update: After figuring out how to work with the repository for sagemaker images I was able to get my problems fixed (which have been solely regarding the HuggingfaceModel not being able to load custom images or to run them: https://github.com/aws/deep-learning-containers

@AliNGatGeeks
Copy link

dummy version, but an annoying bug all the same

Hi
do you have a example for your work around?

@johann-petrak
Copy link

dummy version, but an annoying bug all the same

Hi do you have a example for your work around?

As far as I remember I just added the parameter framework_version="1.8.1"

I can't believe that this issue is still open. The way how AWS issues get ignored by Amazon developers is rather disappointing.

@AliNGatGeeks
Copy link

dummy version, but an annoying bug all the same

Hi do you have a example for your work around?

As far as I remember I just added the parameter framework_version="1.8.1"

I can't believe that this issue is still open. The way how AWS issues get ignored by Amazon developers is rather disappointing.

Thanks this seems to work for me as well....
I hope they fix it soon 😄

@oborchers
Copy link

I hope they fix it soon 😄

Me looking at my inbox and laughing frenetically: No.

@Michael-Bar
Copy link

Does your framework_version="1.8.1" solution definitely call the image from image_uri rather than fetching a different image via the framework_version arg?

@eunseoada
Copy link

eunseoada commented Nov 3, 2023

@Michael-Bar
i have same question.
did you solved this problem?

Does your framework_version="1.8.1" solution definitely call the image from image_uri rather than fetching a different image via the framework_version arg?

@jjerphan
Copy link
Collaborator

Hi all,

#3188 partially has addressed the problem.

Still, some ambiguity remains for the specification of Models if py_version, framework_version and image_uri are all passed.

@martinRenou
Copy link
Collaborator

Closing as fixed by #3188

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests