Model persists when endpoint deploys fails at 'createEndpoint' #2194

evakravi · 2021-03-09T00:05:06Z

I'm a software engineer at AWS that is experiencing this bug in SageMaker Studio (JumpStart). My alias is @evakravi

Describe the bug
If you attempt to deploy a model and are successful in a) uploading the script/deps to s3, (b) creating a model, (c) creating an endpoint config, but d) FAIL to create the endpoint (which could happen if an account does not have GPU instances allocated), then if you attempt to re-deploy the same model with parameters such that the endpoint creation succeeds, the deployment will still fail.

To reproduce
Deploy a model on a disallowed instance type, then deploy the identical model to a permitted instance type.

Expected behavior
The deployment to an allowed instance type should be successful, but it ends up failing.

System information
A description of your system. Please provide:

SageMaker Python SDK version: 2.21.0
Python version: Python 3.7.10
CPU or GPU: SageMaker Studio Jupyter Server
Custom Docker image (Y/N): SageMaker Studio Jupyter Server

Additional context
While this issue is related to SageMaker JumpStart ModelHub, the sagemaker api issue can be reproduced outside of SageMaker Studio.

ChoiByungWook · 2021-03-09T00:34:51Z

What is the exact error/stacktrace are you seeing when you deploy for the second time?

From my understanding it sounds like you're running a similar issue shown in: #1470?

ChoiByungWook · 2021-03-09T00:51:10Z

Something went wrong
We encountered an error while preparing to deploy your endpoint. You can get more details below.
operation deploy failed: An error occurred (ResourceLimitExceeded) when calling the CreateEndpoint operation: The account-level service limit 'ml.g4dn.xlarge for endpoint usage' is 0 Instances, with current utilization of 0 Instances and a request delta of 1 Instances. Please contact AWS support to request an increase for this limit.

ChoiByungWook · 2021-03-09T05:37:11Z

It looks like we check to see if there is an existing endpoint_config:

sagemaker-python-sdk/src/sagemaker/session.py

Line 3443 in 3a1f0b3

if not _deployment_entity_exists(

Are you using the same name in your code for your deployments?

marckarp · 2023-12-21T21:52:30Z

There is not enough in the information provided to reproduce/classify this as a bug. Do you have a code snippets/notebook to reproduce the issue?

ChoiByungWook added the type: bug label Mar 9, 2021

martinRenou added the Pending information label Sep 28, 2023

akrishna1995 closed this as completed Dec 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model persists when endpoint deploys fails at 'createEndpoint' #2194

Model persists when endpoint deploys fails at 'createEndpoint' #2194

evakravi commented Mar 9, 2021

ChoiByungWook commented Mar 9, 2021

ChoiByungWook commented Mar 9, 2021

ChoiByungWook commented Mar 9, 2021

marckarp commented Dec 21, 2023

Model persists when endpoint deploys fails at 'createEndpoint' #2194

Model persists when endpoint deploys fails at 'createEndpoint' #2194

Comments

evakravi commented Mar 9, 2021

ChoiByungWook commented Mar 9, 2021

ChoiByungWook commented Mar 9, 2021

ChoiByungWook commented Mar 9, 2021

marckarp commented Dec 21, 2023