Skip to content

feature: Decouple model.right_size() from model registry #3688

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 3, 2023

Conversation

SSRraymond
Copy link
Contributor

@SSRraymond SSRraymond commented Mar 1, 2023

Issue #, if available:

Description of changes:
Decouple moel.right_size() from model registry

Testing done:
Add unit test.
pip install . and run end 2 end tests with model not registerd.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the CONTRIBUTING doc
  • I certify that the changes I am introducing will be backward compatible, and I have discussed concerns about this, if any, with the Python SDK team
  • I used the commit message format described in CONTRIBUTING
  • I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
  • I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
  • I have checked that my tests are not configured for a specific region or account (if appropriate)
  • I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@SSRraymond SSRraymond requested a review from a team as a code owner March 1, 2023 00:40
@SSRraymond SSRraymond requested review from claytonparnell and removed request for a team March 1, 2023 00:40
@@ -149,12 +148,26 @@ def right_size(

self._init_sagemaker_session_if_does_not_exist()

model_name = None
if isinstance(self, sagemaker.model.FrameworkModel):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if it's an instance of model? what happens? We need to support that case too right?

if isinstance(self, sagemaker.model.FrameworkModel):

unique_tail = uuid.uuid4()
model_name = "SMPYTHONSDK-" + str(unique_tail)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this another name? lets distinguish between the model name and also the default IR job name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we might call it temp_model_name to distinguish from actual model_name

unique_tail = uuid.uuid4()
model_name = "SMPYTHONSDK-" + str(unique_tail)

self.sagemaker_session.create_model(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we log the creation of model to the customer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know if we are missing vpcConfig, networkIsolation and other options?

the deploy() method seems to use this which is a private member of Model. Could we use that?

def _create_sagemaker_model(

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets also be very clear on what the side effects (if any) are of calling _create_sagemaker_model and in turn setting base_name and name for a model that will be deleted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the def _create_sagemaker_model( ) will create a model with the self.name variable, but I think we should allocate and release our own resources to be used for IR job and don't use customer's resources under the hood. If customer already created a model then we don't want to delete it after they call right_size()
We can still do:

create_model_args = dict(
                name=temp_model_name,
                role=self.role,
                container_defs=None,
                primary_container=self.prepare_container_def(),
                vpc_config=self.vpc_config,
                enable_network_isolation=self.enable_network_isolation()
            )

            self.sagemaker_session.create_model(**create_model_args)

Similar to:

self.sagemaker_session.create_model(**create_model_args)

Copy link
Contributor Author

@SSRraymond SSRraymond Mar 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same applies for delete model, sicne we are using our temp_model_name we can't call the delete method in
https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/model.py#L1320 because it deletes the model with self.name
which is customer allocated/managed resource that we should avoid modifying

@@ -175,6 +188,8 @@ def right_size(
"InferenceRecommendations"
)

if model_name is not None:
self.sagemaker_session.delete_model(model_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we log that the temporary model is being deleted to the customer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heads up that we can not delete the model until the jobs are IN_PROGRESS state, since we have mechanism to ensure that model wasn't mutated between PENDING and IN_PROGRESS stages.
just double confirm I think here the jobs should already complete, right? @gwang111

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, sync call until job finishes

@@ -149,12 +148,26 @@ def right_size(

self._init_sagemaker_session_if_does_not_exist()

model_name = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the customer has called model.create() or

def _create_sagemaker_model(self, *args, **kwargs): # pylint: disable=unused-argument
has been called somewhere? model_name should already be set. We should consider that case too right?

Copy link
Collaborator

@gwang111 gwang111 Mar 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also if the model has not been created, is there a way where we can just set the model name and reuse it in deploy()? Maybe then we don't need to delete the model at the end?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it makes sense

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sry mistakenly delete below comment:

I think here the model is created temporarily for invoking IR job, so it should be a temp model created by IR which is independent from customer’s created resources. Also it should be properly cleaned up after IR job done.
Customer experience on model/endpoint creation after right_size should be performed by model.deploy() which is decided by them, so might not to create model(reuse this temp model) for customer in right_size but just making recommendations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, we should allocate and release our own resources to be used for IR job and don't use customer's resources under the hood.

@@ -149,12 +148,26 @@ def right_size(

self._init_sagemaker_session_if_does_not_exist()

model_name = None
if isinstance(self, sagemaker.model.FrameworkModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1
should validate the general Model instance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, so

if isinstance(self, sagemaker.model.Model) and not isinstance(self, sagemaker.model.ModelPackage):

Comment on lines 169 to 180
model_name=model_name,
model_package_version_arn=getattr(self, "model_package_arn", None),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a case model_name and model_package_version_arn both exists when we call model.right_size() (Maybe model is registered before and that field is set)? then based on code logic here, model_package_version_arn will take precedence which is not expected.
We should make sure only one of them will be set, not both. If both exists then should throw exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So whether self.name exists or not doesn't matter, because as long as it is a Model object and not a ModelPackage object, we will create our own temp_model_name and use that to create the resources and run IR job.
If customer calls model_package = model.register() and the model_package.right_size() then only the model_package_arn will be set, while self.name is None. In this case the temp_model_name is also None because the object is sagemaker.model.ModelPackage

There is one case where self.model_package_arn and self.name both exist: if customer explicitly do:

model_package.name = "some name"

We can add validation to prevent it too, thanks!

if isinstance(self, sagemaker.model.FrameworkModel):

unique_tail = uuid.uuid4()
model_name = "SMPYTHONSDK-" + str(unique_tail)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we might call it temp_model_name to distinguish from actual model_name

@@ -175,6 +188,8 @@ def right_size(
"InferenceRecommendations"
)

if model_name is not None:
self.sagemaker_session.delete_model(model_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

heads up that we can not delete the model until the jobs are IN_PROGRESS state, since we have mechanism to ensure that model wasn't mutated between PENDING and IN_PROGRESS stages.
just double confirm I think here the jobs should already complete, right? @gwang111

@@ -175,6 +175,134 @@ def default_right_sized_model(model_package):
)


def test_right_size_default_with_model_name_successful(sagemaker_session, model):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we already performed e2e test locally, please add corresponding integ tests for model name: https://github.com/aws/sagemaker-python-sdk/blob/master/tests/integ/test_inference_recommender.py

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah will also add integ tests

@gwang111
Copy link
Collaborator

gwang111 commented Mar 1, 2023

General Comment: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/inference_recommender/inference_recommender_mixin.py#L80 we may consider elaborating on the model/model_package_arn use cases here and provide a short example on the outcomes.

fyi when making any doc string changes to public facing methods, lets confirm the change via Sphinx

"ModelPackageVersionArn": model_package_version_arn,
},
}
if model_package_version_arn:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why duplicate the whole block?

request = { ... }
request.update(
  { "ModelPackageVersionArn": model_package_version_arn }
  if model_package_version_arn
  else { "ModelName": model_name }
)

unique_tail = uuid.uuid4()
model_name = "SMPYTHONSDK-" + str(unique_tail)

self.sagemaker_session.create_model(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know if we are missing vpcConfig, networkIsolation and other options?

the deploy() method seems to use this which is a private member of Model. Could we use that?

def _create_sagemaker_model(

@SSRraymond
Copy link
Contributor Author

General Comment: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/inference_recommender/inference_recommender_mixin.py#L80 we may consider elaborating on the model/model_package_arn use cases here and provide a short example on the outcomes.

fyi when making any doc string changes to public facing methods, lets confirm the change via Sphinx

Can you give an example? Customer doesn't have to know about the model name / model pkg thing because it all happens under the hood right?

@SSRraymond SSRraymond requested review from gwang111, jbarz1 and jinpengqi and removed request for claytonparnell, gwang111, jbarz1 and jinpengqi March 1, 2023 20:01
@SSRraymond SSRraymond requested review from jbarz1 and removed request for jinpengqi March 1, 2023 22:21
Copy link
Contributor

@trajanikant trajanikant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/bot run all

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 25c678d
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Contributor

@trajanikant trajanikant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/bot run all

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 3ed3d7a
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: 3ed3d7a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 3ed3d7a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 3ed3d7a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@SSRraymond
Copy link
Contributor Author

Failure in slow tests is:

==================================== ERRORS ====================================
_____________ ERROR collecting tests/integ/test_airflow_config.py ______________
tests/integ/test_airflow_config.py:50: in <module>
    for _ in retries(
.tox/py310/lib/python3.10/site-packages/sagemaker/utils.py:600: in retries
    raise Exception(
E   Exception: 'airflow import ' has reached the maximum retry count of 10
------------------------------- Captured stdout --------------------------------
Received: Unable to configure formatter 'airflow'
Received: Unable to configure formatter 'airflow'
Received: Unable to configure formatter 'airflow'
Received: Unable to configure formatter 'airflow'
Received: Unable to configure formatter 'airflow'
Received: Unable to configure formatter 'airflow'
Received: Unable to configure formatter 'airflow'
Received: Unable to configure formatter 'airflow'
Received: Unable to configure formatter 'airflow'
_____________________________ ERROR collecting gw5 _____________________________
Different tests were collected between gw0 and gw5. The difference is:
--- gw0

+++ gw5

@@ -1,10 +1,10 @@

+tests/integ/test_auto_ml.py::test_auto_ml_fit
+tests/integ/test_clarify_model_monitor.py::test_run_bias_monitor
+tests/integ/test_clarify_model_monitor.py::test_run_explainability_monitor
+tests/integ/test_tf.py::test_mnist_async[2.11.0-2.11.0]
 tests/integ/test_mxnet.py::test_deploy_model_and_update_endpoint[1.9.0-py38-1.9.0-py38]
 tests/integ/test_mxnet.py::test_deploy_estimator_with_different_instance_types[1.9.0-py38]
 tests/integ/test_tuner.py::test_tuning_mxnet[1.9.0-py38]
-tests/integ/test_tf.py::test_mnist_async[2.11.0-2.11.0]
-tests/integ/test_auto_ml.py::test_auto_ml_fit
-tests/integ/test_clarify_model_monitor.py::test_run_bias_monitor
-tests/integ/test_clarify_model_monitor.py::test_run_explainability_monitor
 tests/integ/test_inference_pipeline.py::test_inference_pipeline_model_deploy_and_update_endpoint
 tests/integ/test_inference_recommender.py::test_default_right_size_and_deploy_registered_model_sklearn
 tests/integ/test_inference_recommender.py::test_default_right_size_and_deploy_unregistered_model_sklearn

tests/integ/test_airflow_config.py is failing, seems like not related to our change.

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: 3ed3d7a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Contributor

@jbarz1 jbarz1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than 2 comments

Copy link
Contributor

@jbarz1 jbarz1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@trajanikant trajanikant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/bot run all

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 160d5c8
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-pr
  • Commit ID: 160d5c8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 160d5c8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@codecov-commenter
Copy link

codecov-commenter commented Mar 3, 2023

Codecov Report

Merging #3688 (160d5c8) into master (e913520) will decrease coverage by 0.79%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #3688      +/-   ##
==========================================
- Coverage   89.74%   88.96%   -0.79%     
==========================================
  Files         968      227     -741     
  Lines       91124    22548   -68576     
==========================================
- Hits        81780    20060   -61720     
+ Misses       9344     2488    -6856     
Impacted Files Coverage Δ
...ference_recommender/inference_recommender_mixin.py 92.40% <100.00%> (ø)
src/sagemaker/session.py 75.47% <100.00%> (ø)
...-packages/sagemaker/feature_store/feature_group.py
...hon3.8/site-packages/sagemaker/pytorch/defaults.py
...ckages/sagemaker/model_monitor/monitoring_files.py
...hon3.8/site-packages/sagemaker/xgboost/defaults.py
.../sagemaker/cli/compatibility/v2/modifiers/serde.py
...python3.7/site-packages/sagemaker/lineage/query.py
...n3.7/site-packages/sagemaker/lineage/_api_types.py
.../site-packages/sagemaker/model_monitor/__init__.py
... and 1187 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 160d5c8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: 160d5c8
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@trajanikant trajanikant changed the title feature: decouple model.right_size() from model registry feature: Decouple model.right_size() from model registry Mar 3, 2023
@trajanikant trajanikant merged commit 1ad9d9c into aws:master Mar 3, 2023
JoseJuan98 pushed a commit to JoseJuan98/sagemaker-python-sdk that referenced this pull request Mar 4, 2023
@SSRraymond SSRraymond deleted the model branch March 6, 2023 16:18
claytonparnell added a commit that referenced this pull request Mar 7, 2023
nmadan pushed a commit to nmadan/sagemaker-python-sdk that referenced this pull request Apr 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants