Skip to content

Model repack always uploads data to S3 bucket regardless of local mode settings #3031

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
alar0330 opened this issue Mar 27, 2022 · 4 comments
Labels
component: hosting Relates to the SageMaker Hosting Platform type: bug

Comments

@alar0330
Copy link

alar0330 commented Mar 27, 2022

Describe the bug
sagemaker.model.Model uploads data to S3 when repacking, regardless and despite of local mode settings.

Why is this important?
1/ It seems to contradict the intention of using local_code setting to disallow S3 uploads of code-, model-, and other artifacts to S3.
2/ Imagine using Local Mode for dev purposes and having a moderate-sized model (say ResNet-50, with 170mb in serialization). Without proper implementation of local_code=True scenario, every launch in LOCAL MODE will trigger an (unnecessary) upload to S3 due to repack()-call and then an immediate download to the locally-run container due to SM_SUBMIT_DIRECTORY = s3://.... This round trip can causes minutes of delay. Instead of this, the local (repacked) directory with model- and code-artifacts should be mounted directly to the container as /opt/ml/model.

To reproduce

    sagemaker_session = LocalSession()
    sagemaker_session.config = {'local': {'local_code': True}}

    role = 'IAM_ROLE'
    # model_dir can also be `file://` to read model locally
    model_dir = 's3://aws-ml-blog/artifacts/pytorch-script-mode-local-model-inference/model.tar.gz'
    image_uri = "763104351884.dkr.ecr.eu-central-1.amazonaws.com/pytorch-inference:1.8-cpu-py3"

    model = PyTorchModel(
        role=role,
        model_data=model_dir,
        framework_version='1.8',
        entry_point='inference.py', # <--- provide any entry_point to trigger repacking of model data 
        sagemaker_session=sagemaker_session,
        image_uri=image_uri)

      predictor = model.deploy(
        initial_instance_count=1,
        instance_type='local')

Expected behavior
Expected behaviour is that with these local mode settings, the model and inference scripts are repacked and stored locally (as file://...) and not uploaded to S3, as is currently implemented for cases when no repacking is needed.

Advices to fix
sagemaker.model.Model implements _upload_code(), which (in case if repack is needed) defines the following repacked model data path

repacked_model_data = "s3://" + "/".join([bucket, key_prefix, "model.tar.gz"])

irrespective of the LOCAL MODE settings (contradicting the logic for when no repack is needed). Perhaps, what we need to do is introduce a check for LOCAL MODE that will conditionally define the repacked_model_dataeither having s3:// (non-LOCAL) or file:// (LOCAL) paths prefixes. This will (probably) be enough to fix this, as the utils.repack_model(repacked_model_uri=repacked_model_data, ...) checks the prefix of the provided repacked_model_uri to decide whether to upload data to S3, or store locally.

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.80.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): PyTorch
  • Framework version: >1.2
  • Python version: 3.8.12
  • CPU or GPU:
  • Custom Docker image (Y/N):

Additional context
Add any other context about the problem here.

@staubhp
Copy link
Contributor

staubhp commented Mar 29, 2022

This is more a question of behavior than a bug; by specifying an S3 URI for model_data you're already "breaking out" of local mode, so the SDK continues along that line and uploads it back to S3. If you first downloaded the tarball and then passed its file:// location to model_data, it would work as you're expecting.

One case against changing this behavior is it means the SDK would be transparently changing the parameters you're passing to the model. Users will see an S3 location in model_data and might assume that now contains the repacked model, when in reality the SDK changed that to a local location during deployment.

I'll leave it up to the team as to whether they want to change this behavior or not

@staubhp staubhp added type: feature request component: hosting Relates to the SageMaker Hosting Platform and removed type: bug labels Mar 29, 2022
@alar0330
Copy link
Author

Thanks, @staubhp, for a quick response!

I have used model_data = s3:// in example for reproducibility reasons. Please note that even with model_data = file:// and Local Mode settings the repack + upload + download will be triggered.

@staubhp
Copy link
Contributor

staubhp commented Mar 29, 2022

Interesting. That would be a bug then; I'd expect all repacking to happen locally when you start with a file://

@mufaddal-rohawala
Copy link
Member

Fix released. Should be fixed in release >= v2.89.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: hosting Relates to the SageMaker Hosting Platform type: bug
Projects
None yet
Development

No branches or pull requests

3 participants