Skip to content

Commit e747b03

Browse files
authored
Improve error logging and documentation for issue 4007 (#5153)
* Improve error logging and documentation for issue 4007 * Add hyperlink to RTDs
1 parent 9ba4faa commit e747b03

File tree

2 files changed

+74
-10
lines changed

2 files changed

+74
-10
lines changed

doc/frameworks/pytorch/using_pytorch.rst

+46
Original file line numberDiff line numberDiff line change
@@ -1048,6 +1048,43 @@ see `For versions 1.1 and lower <#for-versions-1.1-and-lower>`_.
10481048
10491049
Where ``requirements.txt`` is an optional file that specifies dependencies on third-party libraries.
10501050
1051+
Important Packaging Instructions
1052+
--------------------------------
1053+
1054+
When creating your model artifact (``model.tar.gz``), follow these steps to avoid common deployment issues:
1055+
1056+
1. Navigate to the directory containing your model files:
1057+
1058+
.. code:: bash
1059+
1060+
cd my_model
1061+
1062+
2. Create the tar archive from within this directory:
1063+
1064+
.. code:: bash
1065+
1066+
tar czvf ../model.tar.gz *
1067+
1068+
**Common Mistakes to Avoid:**
1069+
1070+
* Do NOT create the archive from the parent directory using ``tar czvf model.tar.gz my_model/``.
1071+
This creates an extra directory level that will cause deployment errors.
1072+
* Ensure ``inference.py`` is directly under the ``code/`` directory in your archive.
1073+
* Verify your archive structure using:
1074+
1075+
.. code:: bash
1076+
1077+
tar tvf model.tar.gz
1078+
1079+
You should see output similar to:
1080+
1081+
::
1082+
1083+
model.pth
1084+
code/
1085+
code/inference.py
1086+
code/requirements.txt
1087+
10511088
Create a ``PyTorchModel`` object
10521089
--------------------------------
10531090
@@ -1066,6 +1103,15 @@ Now call the :class:`sagemaker.pytorch.model.PyTorchModel` constructor to create
10661103
10671104
Now you can call the ``predict()`` method to get predictions from your deployed model.
10681105
1106+
Troubleshooting
1107+
---------------
1108+
1109+
If you encounter a ``FileNotFoundError`` for ``inference.py``, check:
1110+
1111+
1. That your model artifact is packaged correctly following the instructions above
1112+
2. The structure of your ``model.tar.gz`` file matches the expected layout
1113+
3. You're creating the archive from within the model directory, not from its parent
1114+
10691115
***********************************************
10701116
Attach an estimator to an existing training job
10711117
***********************************************

src/sagemaker/utils.py

+28-10
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,12 @@
1313
"""Placeholder docstring"""
1414
from __future__ import absolute_import
1515

16+
import abc
1617
import contextlib
1718
import copy
1819
import errno
1920
import inspect
21+
import json
2022
import logging
2123
import os
2224
import random
@@ -25,31 +27,30 @@
2527
import tarfile
2628
import tempfile
2729
import time
28-
from functools import lru_cache
29-
from typing import Union, Any, List, Optional, Dict
30-
import json
31-
import abc
3230
import uuid
3331
from datetime import datetime
34-
from os.path import abspath, realpath, dirname, normpath, join as joinpath
35-
32+
from functools import lru_cache
3633
from importlib import import_module
34+
from os.path import abspath, dirname
35+
from os.path import join as joinpath
36+
from os.path import normpath, realpath
37+
from typing import Any, Dict, List, Optional, Union
3738

3839
import boto3
3940
import botocore
4041
from botocore.utils import merge_dicts
41-
from six.moves.urllib import parse
4242
from six import viewitems
43+
from six.moves.urllib import parse
4344

4445
from sagemaker import deprecations
4546
from sagemaker.config import validate_sagemaker_config
4647
from sagemaker.config.config_utils import (
47-
_log_sagemaker_config_single_substitution,
4848
_log_sagemaker_config_merge,
49+
_log_sagemaker_config_single_substitution,
4950
)
5051
from sagemaker.enums import RoutingStrategy
5152
from sagemaker.session_settings import SessionSettings
52-
from sagemaker.workflow import is_pipeline_variable, is_pipeline_parameter_string
53+
from sagemaker.workflow import is_pipeline_parameter_string, is_pipeline_variable
5354
from sagemaker.workflow.entities import PipelineVariable
5455

5556
ALTERNATE_DOMAINS = {
@@ -624,7 +625,24 @@ def _create_or_update_code_dir(
624625
if os.path.exists(os.path.join(code_dir, inference_script)):
625626
pass
626627
else:
627-
raise
628+
raise FileNotFoundError(
629+
f"Could not find '{inference_script}'. Common solutions:\n"
630+
"1. Make sure inference.py exists in the code/ directory\n"
631+
"2. Package your model correctly:\n"
632+
" - ✅ DO: Navigate to the directory containing model files and run:\n"
633+
" cd /path/to/model_files\n"
634+
" tar czvf ../model.tar.gz *\n"
635+
" - ❌ DON'T: Create from parent directory:\n"
636+
" tar czvf model.tar.gz model/\n"
637+
"\nExpected structure in model.tar.gz:\n"
638+
" ├── model.pth (or your model file)\n"
639+
" └── code/\n"
640+
" ├── inference.py\n"
641+
" └── requirements.txt\n"
642+
"\nFor more details, see the documentation:\n"
643+
+ "https://sagemaker.readthedocs.io/en/stable/"
644+
+ "frameworks/pytorch/using_pytorch.html#bring-your-own-model"
645+
)
628646

629647
for dependency in dependencies:
630648
lib_dir = os.path.join(code_dir, "lib")

0 commit comments

Comments
 (0)