Skip to content

Cannot set custom image for PyTorch model from estimator #1344

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mattmcclean opened this issue Mar 10, 2020 · 2 comments
Closed

Cannot set custom image for PyTorch model from estimator #1344

mattmcclean opened this issue Mar 10, 2020 · 2 comments

Comments

@mattmcclean
Copy link
Contributor

mattmcclean commented Mar 10, 2020

Describe the bug
There seems to be a bug in the code when attempting to set a custom image for the PyTorchModel object. The problem is in the PyTorch estimator object that sets the image for the PyTorchModel for inference to be the same as the training image name. This is problematic as there is a different image for inference vs training now. The logic for the create_model() method should check if the parameter image is passed in and use this vs

To reproduce
A clear, step-by-step set of instructions to reproduce the bug.

from sagemaker.pytorch import PyTorch 

hyperparameters = {'epochs': 8}

estimator = PyTorch(source_dir='container/oxford-pets',
                    entry_point='oxford-pets.py',
                    role=role,
                    train_instance_count=1,
                    train_instance_type='local_gpu',
                    framework_version='1.3.1',
                    hyperparameters=hyperparameters,
                    image_name='fastai2-oxford-pets-sm-example-training')

estimator.fit('file://' + str(path))

predictor = model.deploy(1, 'local', image='fastai2-oxford-pets-sm-example-inference')

The following code with produce an error as it will attempt to launch a local Docker container with the image name fastai2-oxford-pets-sm-example-training instead of fastai2-oxford-pets-sm-example-inference

Inspecting the PyTorch model object the value for the image is set incorrectly to fastai2-oxford-pets-sm-example-training instead of fastai2-oxford-pets-sm-example-inference.

The problematic line of code is found here. It should check if the param image is passed in before setting the image param on the model instead of assigning from the var image_name.

The only way around this is to create the model from the estimator and override the param image. An example is shown below:

model = estimator.create_model(role=role, 
                               entry_point='oxford-pets.py', 
                               source_dir='container/oxford-pets', 
                               image='fastai2-oxford-pets-sm-example-inference')

model.image = 'fastai2-oxford-pets-sm-example-inference'

predictor = model.deploy(1, 'local')

Expected behavior
A clear and concise description of what you expected to happen.

The SDK should launch a container from the image named fastai2-oxford-pets-sm-example-inference.

Screenshots or logs
If applicable, add screenshots or logs to help explain your problem.

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 1.50.9.post0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): PyTorch
  • Framework version: 1.3.1
  • Python version: 3.6
  • CPU or GPU: Both
  • Custom Docker image (Y/N): Y

Additional context
Add any other context about the problem here.

@laurenyu
Copy link
Contributor

thanks for submitting your PR!

just want to note here that as a workaround in the meantime, you can do:

# after fit() is complete
estimator.image_name = 'serving-image:latest'
estimator.deploy(...)

@laurenyu
Copy link
Contributor

#1347 was released with v1.51.4. thanks again for the PR!

benieric added a commit that referenced this issue Nov 29, 2023
Co-authored-by: Raymond Liu <[email protected]>
Co-authored-by: Ruilian Gao <[email protected]>
Co-authored-by: John Barboza <[email protected]>
Co-authored-by: Gary Wang <[email protected]>
Co-authored-by: Malav Shastri <[email protected]>
Co-authored-by: Keshav Chandak <[email protected]>
Co-authored-by: Zuoyuan Huang <[email protected]>
Co-authored-by: Ao Guo <[email protected]>
Co-authored-by: Mufaddal Rohawala <[email protected]>
Co-authored-by: Mike Schneider <[email protected]>
Co-authored-by: Bhupendra Singh <[email protected]>
Co-authored-by: ci <ci>
Co-authored-by: Malav Shastri <[email protected]>
Co-authored-by: evakravi <[email protected]>
Co-authored-by: Keshav Chandak <[email protected]>
Co-authored-by: Alexander Pivovarov <[email protected]>
Co-authored-by: qidewenwhen <[email protected]>
Co-authored-by: mariumof <[email protected]>
Co-authored-by: matherit <[email protected]>
Co-authored-by: amzn-choeric <[email protected]>
Co-authored-by: Ao Guo <[email protected]>
Co-authored-by: Sally Seok <[email protected]>
Co-authored-by: Erick Benitez-Ramos <[email protected]>
Co-authored-by: Qingzi-Lan <[email protected]>
Co-authored-by: Sally Seok <[email protected]>
Co-authored-by: Manu Seth <[email protected]>
Co-authored-by: Miyoung <[email protected]>
Co-authored-by: Sarah Castillo <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: Xin Wang <[email protected]>
Co-authored-by: stacicho <[email protected]>
Co-authored-by: martinRenou <[email protected]>
Co-authored-by: jiapinw <[email protected]>
Co-authored-by: Akash Goel <[email protected]>
Co-authored-by: Joseph Zhang <[email protected]>
Co-authored-by: Harsha Reddy <[email protected]>
Co-authored-by: Haixin Wang <[email protected]>
Co-authored-by: Kalyani Nikure <[email protected]>
Co-authored-by: Xin Wang <[email protected]>
Co-authored-by: Gili Nachum <[email protected]>
Co-authored-by: Jose Pena <[email protected]>
Co-authored-by: cansun <[email protected]>
Co-authored-by: AWS-pratab <[email protected]>
Co-authored-by: shenlongtang <[email protected]>
Co-authored-by: Zach Kimberg <[email protected]>
Co-authored-by: chrivtho-github <[email protected]>
Co-authored-by: Justin <[email protected]>
Co-authored-by: Duc Trung Le <[email protected]>
Co-authored-by: HappyAmazonian <[email protected]>
Co-authored-by: cj-zhang <[email protected]>
Co-authored-by: Matthew <[email protected]>
Co-authored-by: Zach Kimberg <[email protected]>
Co-authored-by: Rohith Nadimpally <[email protected]>
Co-authored-by: rohithn1 <[email protected]>
Co-authored-by: Victor Zhu <[email protected]>
Co-authored-by: Gary Wang <[email protected]>
Co-authored-by: SSRraymond <[email protected]>
Co-authored-by: jbarz1 <[email protected]>
Co-authored-by: Mohan Gandhi <[email protected]>
Co-authored-by: Mohan Gandhi <[email protected]>
Co-authored-by: Barboza <[email protected]>
Co-authored-by: ruiliann666 <[email protected]>
Co-authored-by: Rohan Gujarathi <[email protected]>
Co-authored-by: svia3 <[email protected]>
Co-authored-by: Zhankui Lu <[email protected]>
Co-authored-by: Dewen Qi <[email protected]>
Co-authored-by: Edward Sun <[email protected]>
Co-authored-by: Stephen Via <[email protected]>
Co-authored-by: Namrata Madan <[email protected]>
Co-authored-by: Stacia Choe <[email protected]>
Co-authored-by: Edward Sun <[email protected]>
Co-authored-by: Edward Sun <[email protected]>
Co-authored-by: Rohan Gujarathi <[email protected]>
Co-authored-by: JohnaAtAWS <[email protected]>
Co-authored-by: Vera Yu <[email protected]>
Co-authored-by: bhaoz <[email protected]>
Co-authored-by: Qing Lan <[email protected]>
Co-authored-by: Namrata Madan <[email protected]>
Co-authored-by: Sirut Buasai <[email protected]>
Co-authored-by: wayneyao <[email protected]>
Co-authored-by: Jacky Lee <[email protected]>
Co-authored-by: haNa-meister <[email protected]>
Co-authored-by: Shailav <[email protected]>
Fix unit tests (#1018)
Fix happy hf test (#1026)
fix logic setup (#1034)
fixes (#1045)
Fix flake error in init (#1050)
fix (#1053)
fix: skip tensorflow local mode notebook test (#4060)
fix: tags for jumpstart model package models (#4061)
fix: pipeline variable kms key (#4065)
fix: jumpstart cache using sagemaker session s3 client (#4051)
fix: gated models unsupported region (#4069)
fix: pipeline upsert failed to pass parallelism_config to update (#4066)
fix: temporarily skip kmeans notebook (#4092)
fixes (#1051)
Fix missing absolute import error (#1057)
Fix flake8 error in unit test (#1058)
fixes (#1056)
Fix flake8 error in integ test (#1060)
Fix black format error in test_pickle_dependencies (#1062)
Fix docstyle error under serve (#1065)
Fix docstyle error in builder failure (#1066)
fix black and flake8 formatting (#1069)
Fix format error (#1070)
Fix integ test (#1074)
fix: HuggingFaceProcessor parameterized instance_type when image_uri is absent (#4072)
fix: log message when sdk defaults not applied (#4104)
fix: handle bad jumpstart default session (#4109)
Fix the version information, whl and flake8 (#1085)
Fix JSON serializer error (#1088)
Fix unit test (#1091)
fix format (#1103)
Fix local mode predictor (#1107)
Fix DJLPredictor (#1108)
Fix modelbuilder unit tests (#1118)
fixes (#1136)
fixes (#1165)
fixes (#1166)
fix: auto ml integ tests and add flaky test markers (#4136)
fix model data for JumpStartModel (#4135)
fix: transform step  unit test (#4151)
fix: Update pipeline.py and selective_execution_config.py with small fixes (#1099)
fix: Fixed bug in _create_training_details (#4141)
fix: use correct line endings and s3 uris on windows (#4118)
fix: js tagging s3 prefix (#4167)
fix: Update Ec2 instance type to g5.4xlarge in test_huggingface_torch_distributed.py (#4181)
fix: import error in unsupported js regions (#4188)
fix: update local mode schema (#4185)
fix: fix flaky Inference Recommender integration tests (#4156)
fix: clone distribution in validate_distribution (#4205)
Fix hyperlinks in feature_processor.scheduler parameter descriptions (#4208)
Fix master merge formatting (#1186)
Fix master unit tests (#1203)
Fix djl unit tests (#1204)
Fix merge conflicts (#1217)
fix: fix URL links (#4217)
fix: bump urllib3 version (#4223)
fix: relax upper bound on urllib in local mode requirements (#4219)
fixes (#1224)
fix formatting (#1233)
fix byoc unit tests (#1235)
fix byoc unit tests (#1236)
Fixed Modelpackage's deploy calling  model's deploy (#1155)
fix: jumpstart unit-test (#1265)
fixes (#963)
Fix TorchTensorSer/Deser (#969)
fix (#971)
fix local container mode (#972)
Fix auto detect (#979)
Fix routing fn (#981)
fix local container serialization (#989)
fix custom serialiazation with local container. Also remove a  lot of unused code (#994)
Fix custom serialization for local container mode (#1000)
fix pytorch version (#1001)
Fix unit test (#990)
fix: Multiple bug fixes including removing unsupported feature. (#1105)
Fix some problems with pipeline compilation (#1125)
fix: Refactor JsonGet s3 URI and add serialize_output_to_json flag (#1164)
fix: invoke_function circular import (#1262)
fix: pylint (#1264)
fix: Add logging for docker build failures (#1267)
Fix session bug when provided in ModelBuilder (#1288)
fixes (#1313)
fix: Gated content bucket env var override (#1280)
fix: Change the library used in pytorch test causing cloudpickle version conflict (#1287)
fix: HMAC signing for ModelBuilder Triton python backend (#1282)
fix: do not delete temp folder generated by sdist (#1291)
fix: Do not require model_server if provided image_uri is a 1p image. (#1303)
fix: check image type vs instance type (#1307)
fix: unit test (#1315)
fix: Fixed model builder's register unable to deploy (#1323)
fix: missing `self._framework` in `InferenceSpec` path (#1325)
fix: enable xgboost integ test in our own pipeline (#1326)
fix: skip py310 (#1328)
fix: Update autodetect dlc logic (#1329)
Fix secret key in the Model object (#1334)
fix: improve error message (#1333)
Fix unit testing (#1340)
fix: Typing and formatting (#1341)
fix: WaiterError on failed pipeline execution. results() (#1337)
Fix tox identified errors (#1344)
Fix issue when the user runs in Python 3.11 (#1345)
fixes (#1346)
fix: use copy instead of move in bootstrap script (#1339)
Resolve keynote3 conflicts (#1351)
Resolve keynote3 conflicts v2 (#1353)
Fix conflicts (#1354)
Fix conflicts v3 (#1355)
fix: get whl from local to run integ tests (#1357)
fix: enable triton pt tests (#1358)
fix: integ test (#1362)
Fix Python 3.11 issue with dataclass decorator (#1345)
fix: remote function include_local_workdir default value (#1342)
fix: error message (#1373)
fixes (#1372)
fix: Remvoe PickleSerializer (#1378)
benieric added a commit that referenced this issue Nov 29, 2023
Co-authored-by: Raymond Liu <[email protected]>
Co-authored-by: Ruilian Gao <[email protected]>
Co-authored-by: John Barboza <[email protected]>
Co-authored-by: Gary Wang <[email protected]>
Co-authored-by: Malav Shastri <[email protected]>
Co-authored-by: Keshav Chandak <[email protected]>
Co-authored-by: Zuoyuan Huang <[email protected]>
Co-authored-by: Ao Guo <[email protected]>
Co-authored-by: Mufaddal Rohawala <[email protected]>
Co-authored-by: Mike Schneider <[email protected]>
Co-authored-by: Bhupendra Singh <[email protected]>
Co-authored-by: ci <ci>
Co-authored-by: Malav Shastri <[email protected]>
Co-authored-by: evakravi <[email protected]>
Co-authored-by: Keshav Chandak <[email protected]>
Co-authored-by: Alexander Pivovarov <[email protected]>
Co-authored-by: qidewenwhen <[email protected]>
Co-authored-by: mariumof <[email protected]>
Co-authored-by: matherit <[email protected]>
Co-authored-by: amzn-choeric <[email protected]>
Co-authored-by: Ao Guo <[email protected]>
Co-authored-by: Sally Seok <[email protected]>
Co-authored-by: Erick Benitez-Ramos <[email protected]>
Co-authored-by: Qingzi-Lan <[email protected]>
Co-authored-by: Sally Seok <[email protected]>
Co-authored-by: Manu Seth <[email protected]>
Co-authored-by: Miyoung <[email protected]>
Co-authored-by: Sarah Castillo <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: Xin Wang <[email protected]>
Co-authored-by: stacicho <[email protected]>
Co-authored-by: martinRenou <[email protected]>
Co-authored-by: jiapinw <[email protected]>
Co-authored-by: Akash Goel <[email protected]>
Co-authored-by: Joseph Zhang <[email protected]>
Co-authored-by: Harsha Reddy <[email protected]>
Co-authored-by: Haixin Wang <[email protected]>
Co-authored-by: Kalyani Nikure <[email protected]>
Co-authored-by: Xin Wang <[email protected]>
Co-authored-by: Gili Nachum <[email protected]>
Co-authored-by: Jose Pena <[email protected]>
Co-authored-by: cansun <[email protected]>
Co-authored-by: AWS-pratab <[email protected]>
Co-authored-by: shenlongtang <[email protected]>
Co-authored-by: Zach Kimberg <[email protected]>
Co-authored-by: chrivtho-github <[email protected]>
Co-authored-by: Justin <[email protected]>
Co-authored-by: Duc Trung Le <[email protected]>
Co-authored-by: HappyAmazonian <[email protected]>
Co-authored-by: cj-zhang <[email protected]>
Co-authored-by: Matthew <[email protected]>
Co-authored-by: Zach Kimberg <[email protected]>
Co-authored-by: Rohith Nadimpally <[email protected]>
Co-authored-by: rohithn1 <[email protected]>
Co-authored-by: Victor Zhu <[email protected]>
Co-authored-by: Gary Wang <[email protected]>
Co-authored-by: SSRraymond <[email protected]>
Co-authored-by: jbarz1 <[email protected]>
Co-authored-by: Mohan Gandhi <[email protected]>
Co-authored-by: Mohan Gandhi <[email protected]>
Co-authored-by: Barboza <[email protected]>
Co-authored-by: ruiliann666 <[email protected]>
Co-authored-by: Rohan Gujarathi <[email protected]>
Co-authored-by: svia3 <[email protected]>
Co-authored-by: Zhankui Lu <[email protected]>
Co-authored-by: Dewen Qi <[email protected]>
Co-authored-by: Edward Sun <[email protected]>
Co-authored-by: Stephen Via <[email protected]>
Co-authored-by: Namrata Madan <[email protected]>
Co-authored-by: Stacia Choe <[email protected]>
Co-authored-by: Edward Sun <[email protected]>
Co-authored-by: Edward Sun <[email protected]>
Co-authored-by: Rohan Gujarathi <[email protected]>
Co-authored-by: JohnaAtAWS <[email protected]>
Co-authored-by: Vera Yu <[email protected]>
Co-authored-by: bhaoz <[email protected]>
Co-authored-by: Qing Lan <[email protected]>
Co-authored-by: Namrata Madan <[email protected]>
Co-authored-by: Sirut Buasai <[email protected]>
Co-authored-by: wayneyao <[email protected]>
Co-authored-by: Jacky Lee <[email protected]>
Co-authored-by: haNa-meister <[email protected]>
Co-authored-by: Shailav <[email protected]>
Fix unit tests (#1018)
Fix happy hf test (#1026)
fix logic setup (#1034)
fixes (#1045)
Fix flake error in init (#1050)
fix (#1053)
fix: skip tensorflow local mode notebook test (#4060)
fix: tags for jumpstart model package models (#4061)
fix: pipeline variable kms key (#4065)
fix: jumpstart cache using sagemaker session s3 client (#4051)
fix: gated models unsupported region (#4069)
fix: pipeline upsert failed to pass parallelism_config to update (#4066)
fix: temporarily skip kmeans notebook (#4092)
fixes (#1051)
Fix missing absolute import error (#1057)
Fix flake8 error in unit test (#1058)
fixes (#1056)
Fix flake8 error in integ test (#1060)
Fix black format error in test_pickle_dependencies (#1062)
Fix docstyle error under serve (#1065)
Fix docstyle error in builder failure (#1066)
fix black and flake8 formatting (#1069)
Fix format error (#1070)
Fix integ test (#1074)
fix: HuggingFaceProcessor parameterized instance_type when image_uri is absent (#4072)
fix: log message when sdk defaults not applied (#4104)
fix: handle bad jumpstart default session (#4109)
Fix the version information, whl and flake8 (#1085)
Fix JSON serializer error (#1088)
Fix unit test (#1091)
fix format (#1103)
Fix local mode predictor (#1107)
Fix DJLPredictor (#1108)
Fix modelbuilder unit tests (#1118)
fixes (#1136)
fixes (#1165)
fixes (#1166)
fix: auto ml integ tests and add flaky test markers (#4136)
fix model data for JumpStartModel (#4135)
fix: transform step  unit test (#4151)
fix: Update pipeline.py and selective_execution_config.py with small fixes (#1099)
fix: Fixed bug in _create_training_details (#4141)
fix: use correct line endings and s3 uris on windows (#4118)
fix: js tagging s3 prefix (#4167)
fix: Update Ec2 instance type to g5.4xlarge in test_huggingface_torch_distributed.py (#4181)
fix: import error in unsupported js regions (#4188)
fix: update local mode schema (#4185)
fix: fix flaky Inference Recommender integration tests (#4156)
fix: clone distribution in validate_distribution (#4205)
Fix hyperlinks in feature_processor.scheduler parameter descriptions (#4208)
Fix master merge formatting (#1186)
Fix master unit tests (#1203)
Fix djl unit tests (#1204)
Fix merge conflicts (#1217)
fix: fix URL links (#4217)
fix: bump urllib3 version (#4223)
fix: relax upper bound on urllib in local mode requirements (#4219)
fixes (#1224)
fix formatting (#1233)
fix byoc unit tests (#1235)
fix byoc unit tests (#1236)
Fixed Modelpackage's deploy calling  model's deploy (#1155)
fix: jumpstart unit-test (#1265)
fixes (#963)
Fix TorchTensorSer/Deser (#969)
fix (#971)
fix local container mode (#972)
Fix auto detect (#979)
Fix routing fn (#981)
fix local container serialization (#989)
fix custom serialiazation with local container. Also remove a  lot of unused code (#994)
Fix custom serialization for local container mode (#1000)
fix pytorch version (#1001)
Fix unit test (#990)
fix: Multiple bug fixes including removing unsupported feature. (#1105)
Fix some problems with pipeline compilation (#1125)
fix: Refactor JsonGet s3 URI and add serialize_output_to_json flag (#1164)
fix: invoke_function circular import (#1262)
fix: pylint (#1264)
fix: Add logging for docker build failures (#1267)
Fix session bug when provided in ModelBuilder (#1288)
fixes (#1313)
fix: Gated content bucket env var override (#1280)
fix: Change the library used in pytorch test causing cloudpickle version conflict (#1287)
fix: HMAC signing for ModelBuilder Triton python backend (#1282)
fix: do not delete temp folder generated by sdist (#1291)
fix: Do not require model_server if provided image_uri is a 1p image. (#1303)
fix: check image type vs instance type (#1307)
fix: unit test (#1315)
fix: Fixed model builder's register unable to deploy (#1323)
fix: missing `self._framework` in `InferenceSpec` path (#1325)
fix: enable xgboost integ test in our own pipeline (#1326)
fix: skip py310 (#1328)
fix: Update autodetect dlc logic (#1329)
Fix secret key in the Model object (#1334)
fix: improve error message (#1333)
Fix unit testing (#1340)
fix: Typing and formatting (#1341)
fix: WaiterError on failed pipeline execution. results() (#1337)
Fix tox identified errors (#1344)
Fix issue when the user runs in Python 3.11 (#1345)
fixes (#1346)
fix: use copy instead of move in bootstrap script (#1339)
Resolve keynote3 conflicts (#1351)
Resolve keynote3 conflicts v2 (#1353)
Fix conflicts (#1354)
Fix conflicts v3 (#1355)
fix: get whl from local to run integ tests (#1357)
fix: enable triton pt tests (#1358)
fix: integ test (#1362)
Fix Python 3.11 issue with dataclass decorator (#1345)
fix: remote function include_local_workdir default value (#1342)
fix: error message (#1373)
fixes (#1372)
fix: Remvoe PickleSerializer (#1378)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants