Skip to content

feature: Allow custom output for RepackModelStep #2804

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 24 commits into from

Conversation

tuliocasagrande
Copy link
Contributor

@tuliocasagrande tuliocasagrande commented Dec 17, 2021

Issue #, if available:
#2803

Description of changes:
As proposed in #2803:

  1. Add new parameter repack_output_path (default None) to workflow.step_collections.RegisterModel, workflow.step_collections.EstimatorTransformer and workflow._utils._RepackModelStep.
  2. On internal _RepackModelStep() calls, handover repack_output_path=repack_output_path received from public RegisterModel() and EstimatorTransformer()
  3. Inside of _RepackModelStep, send output_path=repack_output_path and code_location=repack_output_path to the internal SKLearn() call.

Testing done:

Tested RegisterModel without repack_output_path (current behavior):

    step_register = RegisterModel(
        name="RegisterModel",
        model=pipeline_model,
        content_types=["text/csv"],
        response_types=["text/csv"],
        inference_instances=["ml.t2.medium", "ml.m5.large"],
        transform_instances=["ml.m5.large"],
        model_package_group_name=model_package_group_name,
        approval_status=model_approval_status,
        model_metrics=model_metrics,
    )

Current behaviour stayed the same on the generated pipeline:

...
"OutputDataConfig": {
    "S3OutputPath": "s3://<my-default-bucket>/"
},
...

Tested RegisterModel with repack_output_path:

    step_register = RegisterModel(
        name="RegisterModel",
        model=pipeline_model,
        content_types=["text/csv"],
        response_types=["text/csv"],
        inference_instances=["ml.t2.medium", "ml.m5.large"],
        transform_instances=["ml.m5.large"],
        model_package_group_name=model_package_group_name,
        approval_status=model_approval_status,
        model_metrics=model_metrics,
        repack_output_path=f"s3://{default_bucket}/{base_job_prefix}/repack",
    )

New output on RepackModel step:

...
"OutputDataConfig": {
    "S3OutputPath": "s3://<my-default-bucket>/<my-base-job-prefix>/repack"
},
...

Added unit test tests/unit/sagemaker/workflow/test_utils.py::test_repack_model_step_with_output_path.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the CONTRIBUTING doc
  • I certify that the changes I am introducing will be backword compatible, and I have discussed concerns about this, if any, with the Python SDK team
  • I used the commit message format described in CONTRIBUTING
  • I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
  • I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
  • I have checked that my tests are not configured for a specific region or account (if appropriate)
  • I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: be0b306
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: abc7cd5
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 7087c5a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@tuliocasagrande tuliocasagrande marked this pull request as ready for review December 17, 2021 21:02
@tuliocasagrande tuliocasagrande changed the title Repack output feature: Allow custom output for RepackModelStep Dec 17, 2021
@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 4643394
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@tuliocasagrande
Copy link
Contributor Author

@staubhp I finished this proposal.
I'd love to hear any feedback when you have the chance. Thanks for reviewing!

@staubhp
Copy link
Contributor

staubhp commented Jan 3, 2022

Looks good to me, thanks for the contribution. Would you mind adding unit tests for RegisterModel & EstimatorTransformer step here just to confirm the repack_path is being wired through to _RepackModelStep?

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 3a2491d
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: f5b4263
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 3a2491d
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: f5b4263
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: f5b4263
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 3a2491d
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

While writing the unit test for EstimatorTransformer with repack model
and a custom output output_path, I discovered that sending an
entry_point to EstimatorTransformer was raising an "unexpected keyword
argument 'entry_point'" on Model.__init__.

Using code from RegisterModel as a base, I removed the entry_point
and other repack variables from kwargs.

Also implemented unit tests for this case.
@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: f5b4263
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: 3a2491d
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@tuliocasagrande
Copy link
Contributor Author

hey @staubhp, it was a great suggestion to add unit tests for RegisterModel & EstimatorTransformer.

I eventually discovered an issue with the entry_point parameter on EstimatorTransformer being propagated until Model and generating an error. We also didn't have unit tests for EstimatorTransformer with _RepackModelStep.

So I split the new changes in two commits:

  1. Removed entry_point from kwargs after being used (similar to RegisterModel) and added unit test for this case: ee6afcf
  2. Added unit tests for RegisterModel & EstimatorTransformer with custom repack output path: faf4ad5

Please let me know what you think. Thanks for reviewing!

@codecov-commenter
Copy link

codecov-commenter commented Jan 14, 2022

Codecov Report

Merging #2804 (3a2491d) into dev (4325fcd) will decrease coverage by 1.14%.
The diff coverage is 80.48%.

❗ Current head 3a2491d differs from pull request most recent head ed9131b. Consider uploading reports for the commit ed9131b to get more accurate results

@@            Coverage Diff             @@
##              dev    #2804      +/-   ##
==========================================
- Coverage   89.82%   88.67%   -1.15%     
==========================================
  Files         196      173      -23     
  Lines       16564    15423    -1141     
==========================================
- Hits        14878    13677    -1201     
- Misses       1686     1746      +60     
Impacted Files Coverage Δ
src/sagemaker/workflow/_repack_model.py 21.73% <0.00%> (-78.27%) ⬇️
src/sagemaker/workflow/_utils.py 85.36% <ø> (-2.54%) ⬇️
src/sagemaker/workflow/step_collections.py 92.55% <ø> (-6.39%) ⬇️
src/sagemaker/dataset_definition/inputs.py 100.00% <100.00%> (ø)
src/sagemaker/workflow/steps.py 96.72% <100.00%> (-1.10%) ⬇️
src/sagemaker/workflow/utilities.py 91.66% <100.00%> (-8.34%) ⬇️
src/sagemaker/huggingface/model.py 32.78% <0.00%> (-45.91%) ⬇️
src/sagemaker/lineage/query.py 75.72% <0.00%> (-19.25%) ⬇️
... and 64 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4325fcd...ed9131b. Read the comment docs.

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 0489b59
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 0489b59
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 0489b59
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

repack_model = True
entry_point = kwargs.pop("entry_point", None)
source_dir = kwargs.pop("source_dir", None)
dependencies = kwargs.pop("dependencies", None)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dependencies = kwargs.pop("dependencies", None)
dependencies = kwargs.pop("dependencies", None)
code_location = kwargs.pop("code_location", None)

Copy link

@eugeneteoh eugeneteoh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/aws/sagemaker-python-sdk/pull/2804/files#diff-043e8c79b0b9ccd41f8164a29ccc1f9e67f38e349aafffb96cd91762c3824bc7R364 (step_collections.py L364)
I suggest changing this to image_uri=image_uri. Using training_image_uri() for inference doesn't actually work.

repack_model_step = _RepackModelStep(
name=f"{name}RepackModel",
depends_on=depends_on,
retry_policies=repack_model_step_retry_policies,
sagemaker_session=estimator.sagemaker_session,
role=estimator.sagemaker_session,
role=estimator.role,
model_data=model_data,
entry_point=entry_point,
source_dir=source_dir,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
source_dir=source_dir,
source_dir=source_dir,
code_location=code_location,

@@ -336,6 +349,8 @@ def __init__(
security_group_ids=estimator.security_group_ids,
description=description,
display_name=display_name,
repack_output_path=repack_output_path,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a repack_image_uri argument? My customer is facing an issue where the python version SKLearn (3.7) estimator is different from the training image uri. That caused dependency errors when trying to install packages that are available on (for example) Python 3.8. Another option would be equating it to estimator.training_image_uri().

Suggested change
repack_output_path=repack_output_path,
repack_output_path=repack_output_path,
image_uri=estimator.training_image_uri(),

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: ed9131b
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: ed9131b
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: ed9131b
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: ed9131b
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: ed9131b
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@qidewenwhen
Copy link
Member

qidewenwhen commented Aug 5, 2022

Update: this recently released commit has made the repack step output path align with model repack path. In other words, users can use the code_location argument in model class to customize the output for RepackModelStep.

Thus, we can close this PR.

@tuliocasagrande
Copy link
Contributor Author

Agree. Superseded by #3257.

@tuliocasagrande tuliocasagrande deleted the repack_output branch August 6, 2022 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.