-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feature: Decouple model.right_size() from model registry #3688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -149,12 +148,26 @@ def right_size( | |||
|
|||
self._init_sagemaker_session_if_does_not_exist() | |||
|
|||
model_name = None | |||
if isinstance(self, sagemaker.model.FrameworkModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if it's an instance of model? what happens? We need to support that case too right?
if isinstance(self, sagemaker.model.FrameworkModel): | ||
|
||
unique_tail = uuid.uuid4() | ||
model_name = "SMPYTHONSDK-" + str(unique_tail) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make this another name? lets distinguish between the model name and also the default IR job name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we might call it temp_model_name
to distinguish from actual model_name
unique_tail = uuid.uuid4() | ||
model_name = "SMPYTHONSDK-" + str(unique_tail) | ||
|
||
self.sagemaker_session.create_model( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we log the creation of model to the customer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we know if we are missing vpcConfig, networkIsolation and other options?
the deploy()
method seems to use this which is a private member of Model. Could we use that?
sagemaker-python-sdk/src/sagemaker/model.py
Line 638 in 5ca7f28
def _create_sagemaker_model( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leverage this on deletion as well: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/model.py#L1320
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets also be very clear on what the side effects (if any) are of calling _create_sagemaker_model
and in turn setting base_name
and name
for a model that will be deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the def _create_sagemaker_model( )
will create a model with the self.name
variable, but I think we should allocate and release our own resources to be used for IR job and don't use customer's resources under the hood. If customer already created a model then we don't want to delete it after they call right_size()
We can still do:
create_model_args = dict(
name=temp_model_name,
role=self.role,
container_defs=None,
primary_container=self.prepare_container_def(),
vpc_config=self.vpc_config,
enable_network_isolation=self.enable_network_isolation()
)
self.sagemaker_session.create_model(**create_model_args)
Similar to:
sagemaker-python-sdk/src/sagemaker/model.py
Line 688 in 5ca7f28
self.sagemaker_session.create_model(**create_model_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same applies for delete model, sicne we are using our temp_model_name
we can't call the delete method in
https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/model.py#L1320 because it deletes the model with self.name
which is customer allocated/managed resource that we should avoid modifying
@@ -175,6 +188,8 @@ def right_size( | |||
"InferenceRecommendations" | |||
) | |||
|
|||
if model_name is not None: | |||
self.sagemaker_session.delete_model(model_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we log that the temporary model is being deleted to the customer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heads up that we can not delete the model until the jobs are IN_PROGRESS
state, since we have mechanism to ensure that model wasn't mutated between PENDING and IN_PROGRESS stages.
just double confirm I think here the jobs should already complete, right? @gwang111
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, sync call until job finishes
@@ -149,12 +148,26 @@ def right_size( | |||
|
|||
self._init_sagemaker_session_if_does_not_exist() | |||
|
|||
model_name = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if the customer has called model.create()
or
sagemaker-python-sdk/src/sagemaker/model.py
Line 1607 in 2e44a2c
def _create_sagemaker_model(self, *args, **kwargs): # pylint: disable=unused-argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also if the model has not been created, is there a way where we can just set the model name and reuse it in deploy()
? Maybe then we don't need to delete the model at the end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sry mistakenly delete below comment:
I think here the model is created temporarily for invoking IR job, so it should be a temp model created by IR which is independent from customer’s created resources. Also it should be properly cleaned up after IR job done.
Customer experience on model/endpoint creation after right_size should be performed by model.deploy() which is decided by them, so might not to create model(reuse this temp model) for customer in right_size but just making recommendations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, we should allocate and release our own resources to be used for IR job and don't use customer's resources under the hood.
@@ -149,12 +148,26 @@ def right_size( | |||
|
|||
self._init_sagemaker_session_if_does_not_exist() | |||
|
|||
model_name = None | |||
if isinstance(self, sagemaker.model.FrameworkModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
should validate the general Model instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, so
if isinstance(self, sagemaker.model.Model) and not isinstance(self, sagemaker.model.ModelPackage):
model_name=model_name, | ||
model_package_version_arn=getattr(self, "model_package_arn", None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a case model_name
and model_package_version_arn
both exists when we call model.right_size()
(Maybe model is registered before and that field is set)? then based on code logic here, model_package_version_arn
will take precedence which is not expected.
We should make sure only one of them will be set, not both. If both exists then should throw exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So whether self.name
exists or not doesn't matter, because as long as it is a Model object and not a ModelPackage object, we will create our own temp_model_name
and use that to create the resources and run IR job.
If customer calls model_package = model.register()
and the model_package.right_size()
then only the model_package_arn
will be set, while self.name
is None. In this case the temp_model_name
is also None because the object is sagemaker.model.ModelPackage
There is one case where self.model_package_arn
and self.name
both exist: if customer explicitly do:
model_package.name = "some name"
We can add validation to prevent it too, thanks!
if isinstance(self, sagemaker.model.FrameworkModel): | ||
|
||
unique_tail = uuid.uuid4() | ||
model_name = "SMPYTHONSDK-" + str(unique_tail) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we might call it temp_model_name
to distinguish from actual model_name
@@ -175,6 +188,8 @@ def right_size( | |||
"InferenceRecommendations" | |||
) | |||
|
|||
if model_name is not None: | |||
self.sagemaker_session.delete_model(model_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
heads up that we can not delete the model until the jobs are IN_PROGRESS
state, since we have mechanism to ensure that model wasn't mutated between PENDING and IN_PROGRESS stages.
just double confirm I think here the jobs should already complete, right? @gwang111
@@ -175,6 +175,134 @@ def default_right_sized_model(model_package): | |||
) | |||
|
|||
|
|||
def test_right_size_default_with_model_name_successful(sagemaker_session, model): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we already performed e2e test locally, please add corresponding integ tests for model name: https://github.com/aws/sagemaker-python-sdk/blob/master/tests/integ/test_inference_recommender.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah will also add integ tests
General Comment: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/inference_recommender/inference_recommender_mixin.py#L80 we may consider elaborating on the model/model_package_arn use cases here and provide a short example on the outcomes. fyi when making any doc string changes to public facing methods, lets confirm the change via Sphinx |
src/sagemaker/session.py
Outdated
"ModelPackageVersionArn": model_package_version_arn, | ||
}, | ||
} | ||
if model_package_version_arn: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why duplicate the whole block?
request = { ... }
request.update(
{ "ModelPackageVersionArn": model_package_version_arn }
if model_package_version_arn
else { "ModelName": model_name }
)
unique_tail = uuid.uuid4() | ||
model_name = "SMPYTHONSDK-" + str(unique_tail) | ||
|
||
self.sagemaker_session.create_model( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we know if we are missing vpcConfig, networkIsolation and other options?
the deploy()
method seems to use this which is a private member of Model. Could we use that?
sagemaker-python-sdk/src/sagemaker/model.py
Line 638 in 5ca7f28
def _create_sagemaker_model( |
Can you give an example? Customer doesn't have to know about the model name / model pkg thing because it all happens under the hood right? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/bot run all
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/bot run all
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Failure in slow tests is:
|
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM other than 2 comments
src/sagemaker/inference_recommender/inference_recommender_mixin.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/bot run all
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Co-authored-by: Raymond Liu <[email protected]>
…)" This reverts commit 1ad9d9c.
Co-authored-by: Raymond Liu <[email protected]>
Issue #, if available:
Description of changes:
Decouple moel.right_size() from model registry
Testing done:
Add unit test.
pip install . and run end 2 end tests with model not registerd.
Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.General
Tests
unique_name_from_base
to create resource names in integ tests (if appropriate)By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.