Skip to content

PyTorch 1.6.0 Inference packaging skips dependencies, other model artefacts #1909

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
setu4993 opened this issue Sep 19, 2020 · 13 comments
Closed

Comments

@setu4993
Copy link

Describe the bug
Model artefacts packagedin model.tar.gz are skipped when the model object is converted into a TorchServe model. Similarly, dependencies included are also dropped.

To reproduce
Add any extra file to model.tar.gz that is not model.pth, and it won't show up in the container. Similarly, any of the extra depenencies specified during the initialization of a PyTorchModel object are dropped.

Expected behavior
Model package would include all of the extra files and dependencies, as described in the API documentation.

Screenshots or logs
Tried os.walk in the model_dir and realized artefacts I was expecting were missing. The below log shows the files in model_dir. I had an extra .pkl object in there which was not included.

2020-09-18 23:50:40,059 [WARN ] W-9000-model_1-stderr org.pytorch.serve.wlm.WorkerLifeCycle - 2020-09-18 23:50:40.058 | INFO | inference:model_fn:59 - ['inference.py', 'handler_service.py', 'model.pth']

Similarly, the directories log is missing the extra dependencies specified while creating the PyTorchModel object under model_dir/lib (#1832 another bug) or model_dir (as specified in API documentation) both.

2020-09-18 23:50:40,058 [WARN ] W-9000-model_1-stderr org.pytorch.serve.wlm.WorkerLifeCycle - 2020-09-18 23:50:40.058 | INFO | inference:model_fn:58 - ['pycache', 'MAR-INF']

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.9.1
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): PyTorch
  • Framework version: 1.6.0
  • Python version: 3.6
  • CPU or GPU: GPU
  • Custom Docker image (Y/N): N

Additional context

The problem arises from this line in the process of TorchServe packaging, that is a result of aws/sagemaker-pytorch-inference-toolkit#79. It drops every other object as a part of the repackaging except the inference script.

Clearly, that is a regression and unexpected.

@chuyang-deng
Copy link
Contributor

Hi @setu4993,

Could you provide the python sdk code and source_dir structure so we could rootcause the issue?

@setu4993
Copy link
Author

setu4993 commented Sep 21, 2020

A short example to reproduce the issue (X and Y are other directories on the file system):

sagemaker_model = PyTorchModel(
    model_data="model.tar.gz",
    source_dir="/root/",
    entry_point="inference.py",
    dependencies=[
        "X",
        "Y",
    ],
    framework_version="1.6.0",
    py_version="py3",
)

After the re-packaging step occurs and a model.tar.gz is created on S3 at the code_location:

├── code
│   ├── README.md
│   ├── inference.py
│   ├── lib
│   │   ├── X
│   │   │   ├── __init__.py
│   │       ├── ...
│   │   └── Y
│   │       ├── README.md
│   │       ├── __init__.py
│   │       ├── ...
│   └── requirements.txt
├── model.pth
└── another_model.pkl

But, at runtime, the only files and directories in the model_dir the container are: inference.py, model.pth and handler_service.py (see logs in the opening comment).

@setu4993
Copy link
Author

The problem here is multi-fold:

  1. Why does an extra repackaging step occur that creates a difference between what is made available on S3 at code_location and what is in the final inference service? This further makes it more difficult to understand (and diagnose) what's happening on the container since the model tarball produced by SageMaker drifts away from the expected (which was repackaged by the SDK!).
  2. Why is everything being dropped from the container?

@setu4993
Copy link
Author

@ChuyangDeng : Any update on this?

@jonsnowseven
Copy link

@setu4993, I have experienced similar issues with that. From the source code, I believe the following line might be the one that's causing what you are experiencing (and me as well):

https://github.com/aws/sagemaker-pytorch-inference-toolkit/blob/9a6869ea3af1ebf9292da0a8d752b0c3389ecdec/src/sagemaker_pytorch_serving_container/torchserve.py#L125

This makes inference.py the only (custom) py file in the model directory. I believe this container should support adding extra/custom artifacts (e.g., model companion objects, model architecture).

Furthermore, Sagemaker Python SDK example with PyTorch (with the code here) does not work with this serving container (because the model has a companion object).

@setu4993
Copy link
Author

Thanks @jonsnowseven. +1, I think I found the same line earlier :).

In my case, it all works with 1.5.0, so I'm sticking to it and updating to 1.6 with requirements.txt.

@ldong87
Copy link

ldong87 commented Feb 5, 2021

After almost 4 months from the last comment, this bug still persists....

@jonsnowseven
Copy link

Yes @ldong87. Any news regarding this?

@declark1
Copy link

Same issue here. Have been trying to work around this for 2 days now with no luck.

@setu4993
Copy link
Author

Wanted to share that @amaharek found a potential solution and posted it on the supplementary issue on the sagemaker-pytorch-inference-toolkit repo here: aws/sagemaker-pytorch-inference-toolkit#85 (comment)

@agurtovoy
Copy link

@dectl One (ugly) workaround is to copy/load the missing model files from /opt/ml/model/, which apparently is a documented decompressed model location: https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-inference-code.html#your-algorithms-inference-code-load-artifacts

@lxning
Copy link

lxning commented Jun 23, 2021

This issue was fixed in TS0.3.1 and toolkitv2.0.5.. These fixes are available in latest SM DLC.

@setu4993
Copy link
Author

Thank you, @lxning! Can confirm that it works and the packages now contain all artifacts as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants