Skip to content

Improve error logging and documentation for issue 4007 #5153

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 6, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions doc/frameworks/pytorch/using_pytorch.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1048,6 +1048,43 @@ see `For versions 1.1 and lower <#for-versions-1.1-and-lower>`_.

Where ``requirements.txt`` is an optional file that specifies dependencies on third-party libraries.

Important Packaging Instructions
--------------------------------

When creating your model artifact (``model.tar.gz``), follow these steps to avoid common deployment issues:

1. Navigate to the directory containing your model files:

.. code:: bash

cd my_model

2. Create the tar archive from within this directory:

.. code:: bash

tar czvf ../model.tar.gz *

**Common Mistakes to Avoid:**

* Do NOT create the archive from the parent directory using ``tar czvf model.tar.gz my_model/``.
This creates an extra directory level that will cause deployment errors.
* Ensure ``inference.py`` is directly under the ``code/`` directory in your archive.
* Verify your archive structure using:

.. code:: bash

tar tvf model.tar.gz

You should see output similar to:

::

model.pth
code/
code/inference.py
code/requirements.txt

Create a ``PyTorchModel`` object
--------------------------------

Expand All @@ -1066,6 +1103,15 @@ Now call the :class:`sagemaker.pytorch.model.PyTorchModel` constructor to create

Now you can call the ``predict()`` method to get predictions from your deployed model.

Troubleshooting
---------------

If you encounter a ``FileNotFoundError`` for ``inference.py``, check:

1. That your model artifact is packaged correctly following the instructions above
2. The structure of your ``model.tar.gz`` file matches the expected layout
3. You're creating the archive from within the model directory, not from its parent

***********************************************
Attach an estimator to an existing training job
***********************************************
Expand Down
38 changes: 28 additions & 10 deletions src/sagemaker/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,10 +13,12 @@
"""Placeholder docstring"""
from __future__ import absolute_import

import abc
import contextlib
import copy
import errno
import inspect
import json
import logging
import os
import random
Expand All @@ -25,31 +27,30 @@
import tarfile
import tempfile
import time
from functools import lru_cache
from typing import Union, Any, List, Optional, Dict
import json
import abc
import uuid
from datetime import datetime
from os.path import abspath, realpath, dirname, normpath, join as joinpath

from functools import lru_cache
from importlib import import_module
from os.path import abspath, dirname
from os.path import join as joinpath
from os.path import normpath, realpath
from typing import Any, Dict, List, Optional, Union

import boto3
import botocore
from botocore.utils import merge_dicts
from six.moves.urllib import parse
from six import viewitems
from six.moves.urllib import parse

from sagemaker import deprecations
from sagemaker.config import validate_sagemaker_config
from sagemaker.config.config_utils import (
_log_sagemaker_config_single_substitution,
_log_sagemaker_config_merge,
_log_sagemaker_config_single_substitution,
)
from sagemaker.enums import RoutingStrategy
from sagemaker.session_settings import SessionSettings
from sagemaker.workflow import is_pipeline_variable, is_pipeline_parameter_string
from sagemaker.workflow import is_pipeline_parameter_string, is_pipeline_variable
from sagemaker.workflow.entities import PipelineVariable

ALTERNATE_DOMAINS = {
Expand Down Expand Up @@ -624,7 +625,24 @@
if os.path.exists(os.path.join(code_dir, inference_script)):
pass
else:
raise
raise FileNotFoundError(

Check warning on line 628 in src/sagemaker/utils.py

View check run for this annotation

Codecov / codecov/patch

src/sagemaker/utils.py#L628

Added line #L628 was not covered by tests
f"Could not find '{inference_script}'. Common solutions:\n"
"1. Make sure inference.py exists in the code/ directory\n"
"2. Package your model correctly:\n"
" - ✅ DO: Navigate to the directory containing model files and run:\n"
" cd /path/to/model_files\n"
" tar czvf ../model.tar.gz *\n"
" - ❌ DON'T: Create from parent directory:\n"
" tar czvf model.tar.gz model/\n"
"\nExpected structure in model.tar.gz:\n"
" ├── model.pth (or your model file)\n"
" └── code/\n"
" ├── inference.py\n"
" └── requirements.txt\n"
"\nFor more details, see the documentation:\n"
+ "https://sagemaker.readthedocs.io/en/stable/"
+ "frameworks/pytorch/using_pytorch.html#bring-your-own-model"
)

for dependency in dependencies:
lib_dir = os.path.join(code_dir, "lib")
Expand Down