PyTorch 1.6.0 Inference packaging skips dependencies, other model artefacts #1909

setu4993 · 2020-09-19T00:30:37Z

Describe the bug
Model artefacts packagedin model.tar.gz are skipped when the model object is converted into a TorchServe model. Similarly, dependencies included are also dropped.

To reproduce
Add any extra file to model.tar.gz that is not model.pth, and it won't show up in the container. Similarly, any of the extra depenencies specified during the initialization of a PyTorchModel object are dropped.

Expected behavior
Model package would include all of the extra files and dependencies, as described in the API documentation.

Screenshots or logs
Tried os.walk in the model_dir and realized artefacts I was expecting were missing. The below log shows the files in model_dir. I had an extra .pkl object in there which was not included.

2020-09-18 23:50:40,059 [WARN ] W-9000-model_1-stderr org.pytorch.serve.wlm.WorkerLifeCycle - 2020-09-18 23:50:40.058 | INFO | inference:model_fn:59 - ['inference.py', 'handler_service.py', 'model.pth']

Similarly, the directories log is missing the extra dependencies specified while creating the PyTorchModel object under model_dir/lib (#1832 another bug) or model_dir (as specified in API documentation) both.

2020-09-18 23:50:40,058 [WARN ] W-9000-model_1-stderr org.pytorch.serve.wlm.WorkerLifeCycle - 2020-09-18 23:50:40.058 | INFO | inference:model_fn:58 - ['pycache', 'MAR-INF']

System information
A description of your system. Please provide:

SageMaker Python SDK version: 2.9.1
Framework name (eg. PyTorch) or algorithm (eg. KMeans): PyTorch
Framework version: 1.6.0
Python version: 3.6
CPU or GPU: GPU
Custom Docker image (Y/N): N

Additional context

The problem arises from this line in the process of TorchServe packaging, that is a result of aws/sagemaker-pytorch-inference-toolkit#79. It drops every other object as a part of the repackaging except the inference script.

Clearly, that is a regression and unexpected.

The text was updated successfully, but these errors were encountered:

chuyang-deng · 2020-09-21T16:33:05Z

Hi @setu4993,

Could you provide the python sdk code and source_dir structure so we could rootcause the issue?

setu4993 · 2020-09-21T17:38:08Z

A short example to reproduce the issue (X and Y are other directories on the file system):

sagemaker_model = PyTorchModel(
    model_data="model.tar.gz",
    source_dir="/root/",
    entry_point="inference.py",
    dependencies=[
        "X",
        "Y",
    ],
    framework_version="1.6.0",
    py_version="py3",
)

After the re-packaging step occurs and a model.tar.gz is created on S3 at the code_location:

├── code
│   ├── README.md
│   ├── inference.py
│   ├── lib
│   │   ├── X
│   │   │   ├── __init__.py
│   │       ├── ...
│   │   └── Y
│   │       ├── README.md
│   │       ├── __init__.py
│   │       ├── ...
│   └── requirements.txt
├── model.pth
└── another_model.pkl

But, at runtime, the only files and directories in the model_dir the container are: inference.py, model.pth and handler_service.py (see logs in the opening comment).

setu4993 · 2020-09-21T17:41:59Z

The problem here is multi-fold:

Why does an extra repackaging step occur that creates a difference between what is made available on S3 at code_location and what is in the final inference service? This further makes it more difficult to understand (and diagnose) what's happening on the container since the model tarball produced by SageMaker drifts away from the expected (which was repackaged by the SDK!).
Why is everything being dropped from the container?

setu4993 · 2020-09-24T17:19:34Z

@ChuyangDeng : Any update on this?

jonsnowseven · 2020-10-14T11:41:10Z

@setu4993, I have experienced similar issues with that. From the source code, I believe the following line might be the one that's causing what you are experiencing (and me as well):

https://github.com/aws/sagemaker-pytorch-inference-toolkit/blob/9a6869ea3af1ebf9292da0a8d752b0c3389ecdec/src/sagemaker_pytorch_serving_container/torchserve.py#L125

This makes inference.py the only (custom) py file in the model directory. I believe this container should support adding extra/custom artifacts (e.g., model companion objects, model architecture).

Furthermore, Sagemaker Python SDK example with PyTorch (with the code here) does not work with this serving container (because the model has a companion object).

setu4993 · 2020-10-14T16:31:23Z

Thanks @jonsnowseven. +1, I think I found the same line earlier :).

In my case, it all works with 1.5.0, so I'm sticking to it and updating to 1.6 with requirements.txt.

ldong87 · 2021-02-05T04:32:23Z

After almost 4 months from the last comment, this bug still persists....

jonsnowseven · 2021-02-05T09:49:18Z

Yes @ldong87. Any news regarding this?

declark1 · 2021-02-18T00:15:25Z

Same issue here. Have been trying to work around this for 2 days now with no luck.

setu4993 · 2021-02-18T05:55:48Z

Wanted to share that @amaharek found a potential solution and posted it on the supplementary issue on the sagemaker-pytorch-inference-toolkit repo here: aws/sagemaker-pytorch-inference-toolkit#85 (comment)

agurtovoy · 2021-02-23T18:34:39Z

@dectl One (ugly) workaround is to copy/load the missing model files from /opt/ml/model/, which apparently is a documented decompressed model location: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html#your-algorithms-inference-code-load-artifacts

lxning · 2021-06-23T01:23:32Z

This issue was fixed in TS0.3.1 and toolkitv2.0.5.. These fixes are available in latest SM DLC.

setu4993 · 2021-06-24T22:43:00Z

Thank you, @lxning! Can confirm that it works and the packages now contain all artifacts as expected.

setu4993 mentioned this issue Sep 19, 2020

Switching to TorchServe for 1.6.0 inference has caused undocumented breaking changes and regression aws/sagemaker-pytorch-inference-toolkit#85

Closed

chuyang-deng added the type: bug label Sep 21, 2020

setu4993 closed this as completed Jun 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyTorch 1.6.0 Inference packaging skips dependencies, other model artefacts #1909

PyTorch 1.6.0 Inference packaging skips dependencies, other model artefacts #1909

setu4993 commented Sep 19, 2020

chuyang-deng commented Sep 21, 2020

setu4993 commented Sep 21, 2020 •

edited

Loading

setu4993 commented Sep 21, 2020

setu4993 commented Sep 24, 2020

jonsnowseven commented Oct 14, 2020

setu4993 commented Oct 14, 2020

ldong87 commented Feb 5, 2021

jonsnowseven commented Feb 5, 2021

declark1 commented Feb 18, 2021

setu4993 commented Feb 18, 2021

agurtovoy commented Feb 23, 2021

lxning commented Jun 23, 2021

setu4993 commented Jun 24, 2021

PyTorch 1.6.0 Inference packaging skips dependencies, other model artefacts #1909

PyTorch 1.6.0 Inference packaging skips dependencies, other model artefacts #1909

Comments

setu4993 commented Sep 19, 2020

chuyang-deng commented Sep 21, 2020

setu4993 commented Sep 21, 2020 • edited Loading

setu4993 commented Sep 21, 2020

setu4993 commented Sep 24, 2020

jonsnowseven commented Oct 14, 2020

setu4993 commented Oct 14, 2020

ldong87 commented Feb 5, 2021

jonsnowseven commented Feb 5, 2021

declark1 commented Feb 18, 2021

setu4993 commented Feb 18, 2021

agurtovoy commented Feb 23, 2021

lxning commented Jun 23, 2021

setu4993 commented Jun 24, 2021

setu4993 commented Sep 21, 2020 •

edited

Loading