Skip to content

workflow._RepackModelStep can fail if source_dir contains requirements.txt #3291

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
akreuzer opened this issue Aug 11, 2022 · 2 comments
Closed
Labels
component: pipelines Relates to the SageMaker Pipeline Platform type: bug

Comments

@akreuzer
Copy link

Describe the bug
_RepackModelStep is a part of the sagemaker.workflow.model_step.ModelStep.
Its purpose is to attach a source_dir to the plain model.tar.gz.
As I understand, this is done using a training step that does not do any real training but just repacks the model, see

repacker = SKLearn(
framework_version=FRAMEWORK_VERSION,
instance_type=INSTANCE_TYPE,
entry_point=REPACK_SCRIPT,

If the source_dir contains a requirements.txt, this step can fail. This is due to the fact in the step above the requirements will be installed even though they might not be compatible. See attached logs.

To reproduce
An example of a source_dir where this is the case is Jumpstart LightGBM Inference source_dir available at s3://jumpstart-cache-prod-us-east-2/source-directory-tarballs/lightgbm/inference/regression/v1.1.0/sourcedir.tar.gz
Its requirements.txt looks like this.

/opt/ml/model/code/lib/lightgbm/tenacity-8.0.1-py3-none-any.whl
/opt/ml/model/code/lib/lightgbm/plotly-5.1.0-py2.py3-none-any.whl
/opt/ml/model/code/lib/lightgbm/graphviz-0.17-py3-none-any.whl
...

Adding this source_dir to a model using the following code results in pipeline failure.

model = Model(
    image_uri=deploy_image_uri,
    source_dir=deploy_source_uri_cache,
    entry_point="inference.py",
    model_data=step_train.properties.ModelArtifacts.S3ModelArtifacts,
    sagemaker_session=pipeline_session,
    role=role,
)

step_model = ModelStep("RegisterModel", 
    step_args=model.register(
        content_types=["text/csv"],
        response_types=["application/json"],
        inference_instances=["ml.t2.medium"],
        transform_instances=["ml.m5.xlarge"],
        model_package_group_name="lgbm-test",
        
    ))

(Note that in the above code, deploy_source_uri_cache points to a copy of s3://jumpstart-cache-prod-us-east-2/source-directory-tarballs/lightgbm/inference/regression/v1.1.0/sourcedir.tar.gz as it is modified by _RepackModelStep)

Expected behavior
_RepackModelStep should work regardless of the content of source_dir.

Screenshots or logs
Screen Shot 2022-08-11 at 7 10 16 AM

System information
A description of your system. Please provide:

  • SageMaker Python SDK version: 2.103.0
  • Framework name (eg. PyTorch) or algorithm (eg. KMeans): JumpStart LightGBM
  • Framework version: N/A
  • Python version: 3.8
  • CPU or GPU: CPU
  • Custom Docker image (Y/N): N

Additional context
Add any other context about the problem here.

@rohangujarathi rohangujarathi added the component: pipelines Relates to the SageMaker Pipeline Platform label Nov 12, 2022
@qidewenwhen
Copy link
Member

Hi @akreuzer, thanks for using SageMaker!

This issue seems to be duplicate with #3143, which has been resolved.
Could you please upgrade your sagemaker SDK and see if it works for you?

@akreuzer
Copy link
Author

akreuzer commented Dec 2, 2022

Great 👍 Works

@akreuzer akreuzer closed this as completed Dec 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: pipelines Relates to the SageMaker Pipeline Platform type: bug
Projects
None yet
Development

No branches or pull requests

3 participants