feature: Decouple model.right_size() from model registry #3688

SSRraymond · 2023-03-01T00:40:13Z

Issue #, if available:

Description of changes:
Decouple moel.right_size() from model registry

Testing done:
Add unit test.
pip install . and run end 2 end tests with model not registerd.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

I have read the CONTRIBUTING doc
I certify that the changes I am introducing will be backward compatible, and I have discussed concerns about this, if any, with the Python SDK team
I used the commit message format described in CONTRIBUTING
I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

I have added tests that prove my fix is effective or that my feature works (if appropriate)
I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
I have checked that my tests are not configured for a specific region or account (if appropriate)
I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

gwang111 · 2023-03-01T02:36:36Z

src/sagemaker/inference_recommender/inference_recommender_mixin.py

@@ -149,12 +148,26 @@ def right_size(

        self._init_sagemaker_session_if_does_not_exist()

+        model_name = None
+        if isinstance(self, sagemaker.model.FrameworkModel):


what if it's an instance of model? what happens? We need to support that case too right?

gwang111 · 2023-03-01T02:40:07Z

src/sagemaker/inference_recommender/inference_recommender_mixin.py

+        if isinstance(self, sagemaker.model.FrameworkModel):
+
+            unique_tail = uuid.uuid4()
+            model_name = "SMPYTHONSDK-" + str(unique_tail)


can we make this another name? lets distinguish between the model name and also the default IR job name

nit: we might call it temp_model_name to distinguish from actual model_name

gwang111 · 2023-03-01T02:41:26Z

src/sagemaker/inference_recommender/inference_recommender_mixin.py

+            unique_tail = uuid.uuid4()
+            model_name = "SMPYTHONSDK-" + str(unique_tail)
+
+            self.sagemaker_session.create_model(


should we log the creation of model to the customer?

do we know if we are missing vpcConfig, networkIsolation and other options?

the deploy() method seems to use this which is a private member of Model. Could we use that?

sagemaker-python-sdk/src/sagemaker/model.py

Line 638 in 5ca7f28

def _create_sagemaker_model(

leverage this on deletion as well: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/model.py#L1320

lets also be very clear on what the side effects (if any) are of calling _create_sagemaker_model and in turn setting base_name and name for a model that will be deleted

the def _create_sagemaker_model( ) will create a model with the self.name variable, but I think we should allocate and release our own resources to be used for IR job and don't use customer's resources under the hood. If customer already created a model then we don't want to delete it after they call right_size()
We can still do:

create_model_args = dict( name=temp_model_name, role=self.role, container_defs=None, primary_container=self.prepare_container_def(), vpc_config=self.vpc_config, enable_network_isolation=self.enable_network_isolation() ) self.sagemaker_session.create_model(**create_model_args)

Similar to:

sagemaker-python-sdk/src/sagemaker/model.py

Line 688 in 5ca7f28

self.sagemaker_session.create_model(**create_model_args)

Same applies for delete model, sicne we are using our temp_model_name we can't call the delete method in
https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/model.py#L1320 because it deletes the model with self.name
which is customer allocated/managed resource that we should avoid modifying

gwang111 · 2023-03-01T02:42:02Z

src/sagemaker/inference_recommender/inference_recommender_mixin.py

@@ -175,6 +188,8 @@ def right_size(
            "InferenceRecommendations"
        )

+        if model_name is not None:
+            self.sagemaker_session.delete_model(model_name)


should we log that the temporary model is being deleted to the customer?

heads up that we can not delete the model until the jobs are IN_PROGRESS state, since we have mechanism to ensure that model wasn't mutated between PENDING and IN_PROGRESS stages.
just double confirm I think here the jobs should already complete, right? @gwang111

yep, sync call until job finishes

gwang111 · 2023-03-01T03:31:40Z

src/sagemaker/inference_recommender/inference_recommender_mixin.py

@@ -149,12 +148,26 @@ def right_size(

        self._init_sagemaker_session_if_does_not_exist()

+        model_name = None


what if the customer has called model.create() or

sagemaker-python-sdk/src/sagemaker/model.py

Line 1607 in 2e44a2c

def _create_sagemaker_model(self, *args, **kwargs): # pylint: disable=unused-argument

has been called somewhere? model_name should already be set. We should consider that case too right?

also if the model has not been created, is there a way where we can just set the model name and reuse it in deploy()? Maybe then we don't need to delete the model at the end?

got it makes sense

sry mistakenly delete below comment:

I think here the model is created temporarily for invoking IR job, so it should be a temp model created by IR which is independent from customer’s created resources. Also it should be properly cleaned up after IR job done.
Customer experience on model/endpoint creation after right_size should be performed by model.deploy() which is decided by them, so might not to create model(reuse this temp model) for customer in right_size but just making recommendations

Agreed, we should allocate and release our own resources to be used for IR job and don't use customer's resources under the hood.

jinpengqi · 2023-03-01T03:06:37Z

src/sagemaker/inference_recommender/inference_recommender_mixin.py

@@ -149,12 +148,26 @@ def right_size(

        self._init_sagemaker_session_if_does_not_exist()

+        model_name = None
+        if isinstance(self, sagemaker.model.FrameworkModel):


+1
should validate the general Model instance

Sounds good, so

if isinstance(self, sagemaker.model.Model) and not isinstance(self, sagemaker.model.ModelPackage):

jinpengqi · 2023-03-01T04:49:20Z

src/sagemaker/inference_recommender/inference_recommender_mixin.py

+            model_name=model_name,
+            model_package_version_arn=getattr(self, "model_package_arn", None),


Is there a case model_name and model_package_version_arn both exists when we call model.right_size() (Maybe model is registered before and that field is set)? then based on code logic here, model_package_version_arn will take precedence which is not expected.
We should make sure only one of them will be set, not both. If both exists then should throw exception.

So whether self.name exists or not doesn't matter, because as long as it is a Model object and not a ModelPackage object, we will create our own temp_model_name and use that to create the resources and run IR job.
If customer calls model_package = model.register() and the model_package.right_size() then only the model_package_arn will be set, while self.name is None. In this case the temp_model_name is also None because the object is sagemaker.model.ModelPackage

There is one case where self.model_package_arn and self.name both exist: if customer explicitly do:

model_package.name = "some name"

We can add validation to prevent it too, thanks!

jinpengqi · 2023-03-01T04:51:58Z

src/sagemaker/inference_recommender/inference_recommender_mixin.py

+        if isinstance(self, sagemaker.model.FrameworkModel):
+
+            unique_tail = uuid.uuid4()
+            model_name = "SMPYTHONSDK-" + str(unique_tail)


nit: we might call it temp_model_name to distinguish from actual model_name

jinpengqi · 2023-03-01T04:58:41Z

src/sagemaker/inference_recommender/inference_recommender_mixin.py

@@ -175,6 +188,8 @@ def right_size(
            "InferenceRecommendations"
        )

+        if model_name is not None:
+            self.sagemaker_session.delete_model(model_name)


heads up that we can not delete the model until the jobs are IN_PROGRESS state, since we have mechanism to ensure that model wasn't mutated between PENDING and IN_PROGRESS stages.
just double confirm I think here the jobs should already complete, right? @gwang111

jinpengqi · 2023-03-01T05:03:19Z

tests/unit/sagemaker/inference_recommender/test_inference_recommender_mixin.py

@@ -175,6 +175,134 @@ def default_right_sized_model(model_package):
    )


+def test_right_size_default_with_model_name_successful(sagemaker_session, model):


I think we already performed e2e test locally, please add corresponding integ tests for model name: https://github.com/aws/sagemaker-python-sdk/blob/master/tests/integ/test_inference_recommender.py

Yeah will also add integ tests

gwang111 · 2023-03-01T05:24:46Z

General Comment: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/inference_recommender/inference_recommender_mixin.py#L80 we may consider elaborating on the model/model_package_arn use cases here and provide a short example on the outcomes.

fyi when making any doc string changes to public facing methods, lets confirm the change via Sphinx

jbarz1 · 2023-03-01T15:22:25Z

src/sagemaker/session.py

-                "ModelPackageVersionArn": model_package_version_arn,
-            },
-        }
+        if model_package_version_arn:


why duplicate the whole block?

request = { ... } request.update( { "ModelPackageVersionArn": model_package_version_arn } if model_package_version_arn else { "ModelName": model_name } )

jbarz1 · 2023-03-01T15:32:10Z

src/sagemaker/inference_recommender/inference_recommender_mixin.py

+            unique_tail = uuid.uuid4()
+            model_name = "SMPYTHONSDK-" + str(unique_tail)
+
+            self.sagemaker_session.create_model(


do we know if we are missing vpcConfig, networkIsolation and other options?

the deploy() method seems to use this which is a private member of Model. Could we use that?

sagemaker-python-sdk/src/sagemaker/model.py

Line 638 in 5ca7f28

def _create_sagemaker_model(

SSRraymond · 2023-03-01T19:57:11Z

General Comment: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/inference_recommender/inference_recommender_mixin.py#L80 we may consider elaborating on the model/model_package_arn use cases here and provide a short example on the outcomes.

fyi when making any doc string changes to public facing methods, lets confirm the change via Sphinx

Can you give an example? Customer doesn't have to know about the model name / model pkg thing because it all happens under the hood right?

trajanikant

/bot run all

sagemaker-bot · 2023-03-02T00:28:46Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-unit-tests
Commit ID: 25c678d
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

trajanikant

/bot run all

sagemaker-bot · 2023-03-02T21:52:50Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-slow-tests
Commit ID: 3ed3d7a
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2023-03-02T22:25:55Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-pr
Commit ID: 3ed3d7a
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2023-03-02T22:35:03Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-unit-tests
Commit ID: 3ed3d7a
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2023-03-02T22:41:05Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-local-mode-tests
Commit ID: 3ed3d7a
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

SSRraymond · 2023-03-02T22:55:02Z

Failure in slow tests is:

==================================== ERRORS ====================================
_____________ ERROR collecting tests/integ/test_airflow_config.py ______________
tests/integ/test_airflow_config.py:50: in <module>
    for _ in retries(
.tox/py310/lib/python3.10/site-packages/sagemaker/utils.py:600: in retries
    raise Exception(
E   Exception: 'airflow import ' has reached the maximum retry count of 10
------------------------------- Captured stdout --------------------------------
Received: Unable to configure formatter 'airflow'
Received: Unable to configure formatter 'airflow'
Received: Unable to configure formatter 'airflow'
Received: Unable to configure formatter 'airflow'
Received: Unable to configure formatter 'airflow'
Received: Unable to configure formatter 'airflow'
Received: Unable to configure formatter 'airflow'
Received: Unable to configure formatter 'airflow'
Received: Unable to configure formatter 'airflow'
_____________________________ ERROR collecting gw5 _____________________________
Different tests were collected between gw0 and gw5. The difference is:
--- gw0

+++ gw5

@@ -1,10 +1,10 @@

+tests/integ/test_auto_ml.py::test_auto_ml_fit
+tests/integ/test_clarify_model_monitor.py::test_run_bias_monitor
+tests/integ/test_clarify_model_monitor.py::test_run_explainability_monitor
+tests/integ/test_tf.py::test_mnist_async[2.11.0-2.11.0]
 tests/integ/test_mxnet.py::test_deploy_model_and_update_endpoint[1.9.0-py38-1.9.0-py38]
 tests/integ/test_mxnet.py::test_deploy_estimator_with_different_instance_types[1.9.0-py38]
 tests/integ/test_tuner.py::test_tuning_mxnet[1.9.0-py38]
-tests/integ/test_tf.py::test_mnist_async[2.11.0-2.11.0]
-tests/integ/test_auto_ml.py::test_auto_ml_fit
-tests/integ/test_clarify_model_monitor.py::test_run_bias_monitor
-tests/integ/test_clarify_model_monitor.py::test_run_explainability_monitor
 tests/integ/test_inference_pipeline.py::test_inference_pipeline_model_deploy_and_update_endpoint
 tests/integ/test_inference_recommender.py::test_default_right_size_and_deploy_registered_model_sklearn
 tests/integ/test_inference_recommender.py::test_default_right_size_and_deploy_unregistered_model_sklearn

tests/integ/test_airflow_config.py is failing, seems like not related to our change.

sagemaker-bot · 2023-03-02T22:55:52Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-notebook-tests
Commit ID: 3ed3d7a
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

jbarz1

LGTM other than 2 comments

src/sagemaker/inference_recommender/inference_recommender_mixin.py

src/sagemaker/session.py

jbarz1

LGTM

trajanikant

/bot run all

sagemaker-bot · 2023-03-03T00:11:18Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-slow-tests
Commit ID: 160d5c8
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2023-03-03T00:41:28Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-pr
Commit ID: 160d5c8
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2023-03-03T00:52:18Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-unit-tests
Commit ID: 160d5c8
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

codecov-commenter · 2023-03-03T00:52:47Z

Codecov Report

Merging #3688 (160d5c8) into master (e913520) will decrease coverage by 0.79%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #3688      +/-   ##
==========================================
- Coverage   89.74%   88.96%   -0.79%     
==========================================
  Files         968      227     -741     
  Lines       91124    22548   -68576     
==========================================
- Hits        81780    20060   -61720     
+ Misses       9344     2488    -6856

Impacted Files	Coverage Δ
...ference_recommender/inference_recommender_mixin.py	`92.40% <100.00%> (ø)`
src/sagemaker/session.py	`75.47% <100.00%> (ø)`
...-packages/sagemaker/feature_store/feature_group.py
...hon3.8/site-packages/sagemaker/pytorch/defaults.py
...ckages/sagemaker/model_monitor/monitoring_files.py
...hon3.8/site-packages/sagemaker/xgboost/defaults.py
.../sagemaker/cli/compatibility/v2/modifiers/serde.py
...python3.7/site-packages/sagemaker/lineage/query.py
...n3.7/site-packages/sagemaker/lineage/_api_types.py
.../site-packages/sagemaker/model_monitor/__init__.py
... and 1187 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

sagemaker-bot · 2023-03-03T00:55:47Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-local-mode-tests
Commit ID: 160d5c8
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2023-03-03T01:01:49Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-notebook-tests
Commit ID: 160d5c8
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Co-authored-by: Raymond Liu <[email protected]>

…)" This reverts commit 1ad9d9c.

Co-authored-by: Raymond Liu <[email protected]>

SSRraymond requested a review from a team as a code owner March 1, 2023 00:40

SSRraymond requested review from claytonparnell and removed request for a team March 1, 2023 00:40

gwang111 reviewed Mar 1, 2023

View reviewed changes

jinpengqi reviewed Mar 1, 2023

View reviewed changes

jbarz1 reviewed Mar 1, 2023

View reviewed changes

SSRraymond requested review from gwang111, jbarz1 and jinpengqi and removed request for claytonparnell, gwang111, jbarz1 and jinpengqi March 1, 2023 20:01

SSRraymond force-pushed the model branch from fc0d369 to 25c678d Compare March 1, 2023 22:10

SSRraymond requested review from jbarz1 and removed request for jinpengqi March 1, 2023 22:21

trajanikant reviewed Mar 2, 2023

View reviewed changes

SSRraymond mentioned this pull request Mar 2, 2023

update notebook examples with pythin sdk inference recommender decoup… aws/amazon-sagemaker-examples#3817

Merged

4 tasks

Raymond Liu added 3 commits March 2, 2023 18:46

apply formatting

f2ed8f4

more formatting

0f35930

address comments

3ed3d7a

SSRraymond force-pushed the model branch from 9e5739a to 3ed3d7a Compare March 2, 2023 18:46

SSRraymond requested a review from jbarz1 March 2, 2023 21:06

trajanikant reviewed Mar 2, 2023

View reviewed changes

jbarz1 reviewed Mar 2, 2023

View reviewed changes

src/sagemaker/inference_recommender/inference_recommender_mixin.py Outdated Show resolved Hide resolved

src/sagemaker/session.py Outdated Show resolved Hide resolved

update naming for model to follow deploy()

160d5c8

jbarz1 approved these changes Mar 2, 2023

View reviewed changes

trajanikant reviewed Mar 3, 2023

View reviewed changes

trajanikant approved these changes Mar 3, 2023

View reviewed changes

trajanikant changed the title ~~feature: decouple model.right_size() from model registry~~ feature: Decouple model.right_size() from model registry Mar 3, 2023

trajanikant merged commit 1ad9d9c into aws:master Mar 3, 2023

JoseJuan98 pushed a commit to JoseJuan98/sagemaker-python-sdk that referenced this pull request Mar 4, 2023

feature: Decouple model.right_size() from model registry (aws#3688)

a275880

Co-authored-by: Raymond Liu <[email protected]>

SSRraymond deleted the model branch March 6, 2023 16:18

claytonparnell added a commit that referenced this pull request Mar 7, 2023

Revert "feature: Decouple model.right_size() from model registry (#3688…

427f60c

…)" This reverts commit 1ad9d9c.

claytonparnell mentioned this pull request Mar 7, 2023

Revert "feature: Decouple model.right_size() from model registry" #3700

Closed

nmadan pushed a commit to nmadan/sagemaker-python-sdk that referenced this pull request Apr 18, 2023

feature: Decouple model.right_size() from model registry (aws#3688)

9abefbe

Co-authored-by: Raymond Liu <[email protected]>

		@@ -149,12 +148,26 @@ def right_size(

		self._init_sagemaker_session_if_does_not_exist()

		model_name = None

		model_name=model_name,
		model_package_version_arn=getattr(self, "model_package_arn", None),

		@@ -175,6 +175,134 @@ def default_right_sized_model(model_package):
		)


		def test_right_size_default_with_model_name_successful(sagemaker_session, model):

feature: Decouple model.right_size() from model registry #3688

feature: Decouple model.right_size() from model registry #3688

Conversation

SSRraymond commented Mar 1, 2023 • edited Loading

Merge Checklist

General

Tests

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SSRraymond Mar 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gwang111 Mar 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gwang111 commented Mar 1, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SSRraymond commented Mar 1, 2023

trajanikant left a comment

Choose a reason for hiding this comment

sagemaker-bot commented Mar 2, 2023

AWS CodeBuild CI Report

trajanikant left a comment

Choose a reason for hiding this comment

sagemaker-bot commented Mar 2, 2023

AWS CodeBuild CI Report

sagemaker-bot commented Mar 2, 2023

AWS CodeBuild CI Report

sagemaker-bot commented Mar 2, 2023

AWS CodeBuild CI Report

sagemaker-bot commented Mar 2, 2023

AWS CodeBuild CI Report

SSRraymond commented Mar 2, 2023

sagemaker-bot commented Mar 2, 2023

AWS CodeBuild CI Report

jbarz1 left a comment

Choose a reason for hiding this comment

jbarz1 left a comment

Choose a reason for hiding this comment

trajanikant left a comment

Choose a reason for hiding this comment

sagemaker-bot commented Mar 3, 2023

AWS CodeBuild CI Report

sagemaker-bot commented Mar 3, 2023

AWS CodeBuild CI Report

sagemaker-bot commented Mar 3, 2023

AWS CodeBuild CI Report

codecov-commenter commented Mar 3, 2023 • edited Loading

Codecov Report

sagemaker-bot commented Mar 3, 2023

AWS CodeBuild CI Report

sagemaker-bot commented Mar 3, 2023

AWS CodeBuild CI Report

SSRraymond commented Mar 1, 2023 •

edited

Loading

SSRraymond Mar 1, 2023 •

edited

Loading

gwang111 Mar 1, 2023 •

edited

Loading

codecov-commenter commented Mar 3, 2023 •

edited

Loading