-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Model deploy instance_type modification failed #987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hello @Wei-1, Thank you for bringing this to our attention. Let me look into this and figure out why the instance type isn't being propagated correctly. Thank you for your patience. |
Can you paste how you instantiate the PCA algorithm? Do you specify a name in the constructor, as I am suspecting this line causes an existing endpoint configuration name to be used: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/session.py#L1306 That line is fed when attempting to deploy your model: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/model.py#L427 |
I follow the lab tutorial: link pca = sagemaker.estimator.Estimator(container,
role,
train_instance_count=1,
train_instance_type='ml.c4.xlarge',
output_path=output_location,
sagemaker_session=sess)
pca.set_hyperparameters(feature_dim=50000,
num_components=10,
subtract_mean=True,
algorithm_mode='randomized',
mini_batch_size=200)
pca.fit({'train': s3_train_data}) pca_predictor = pca.deploy(initial_instance_count=1,
instance_type='ml.m4.xlarge') |
Thank you for the clarification. Can you do me a quick favor and check your endpoint configurations in the AWS console? AWS Console -> Amazon SageMaker -> Endpoint configurations Or you can use the cli: https://docs.aws.amazon.com/cli/latest/reference/sagemaker/list-endpoint-configs.html Can you tell me if you see any corresponding endpoint configurations that have the expected "ml.t2.medium", because based on the code links provided the session object should be propagating correctly. Thanks! |
I had rerun the whole estimator initiation process so I can see both t2.medium and m4.xlarge. |
While running the code the first time with |
Gotcha, so from a notebook context this is still failing. Hmmm... I wonder if there is something going on with the Python cache in the notebook environment. Thank you so much for all of this information. I think this will require a bit of dedicated investigation. I'll bring this up with the team. Thanks! |
hi @Wei-1, can you try adding reference: https://sagemaker.readthedocs.io/en/stable/pca.html?highlight=update_endpoint#sagemaker.PCA.deploy |
@laurenyu, Now it is even more interesting... pca_predictor = pca.deploy(initial_instance_count=1,
instance_type='ml.t2.medium',
update_endpoint=True) I will get this error message:
It will show that the endpointConfig already exist, It will NOT create a new endpointConfig with |
@Wei-1 hmm, can you try specifying a new endpoint name? |
pca_predictor = pca.deploy(initial_instance_count=1,
instance_type='ml.t2.medium',
endpoint_name='NewEndpointName') @laurenyu, with a new endpoint name, the endpointCondfig with |
Hi @Wei-1, Thanks for reporting this bug. I am adding to our roadmap changes in the behaviour of Thanks for using SageMaker! |
Cool @mvsusp! |
You are right. I believe that the issue is that One possible solution to fix this issue is to always generate models with a new name, perhaps in the format of name + '_' + timestamp. Any contribution will be highly appreciated. Thanks @Wei-1 |
I will check if I am able to solve the issue this weekend. |
@Wei-1 the update_endpoint arg is designed to handle this case. Are you still seeing error or unexpected behavior when update_endpoint is set to True and and a new endpoint name is given in the second call? |
@icywang86rui, if we use a new endpoint name, the module will behave as designed. |
Thanks for start working on this PR @Wei-1 . Let us know if you have any doubts. |
I changed some naming of the |
I will answer your observations in the PR. Thanks. |
Hello @ChoiByungWook, any follow up on this? |
getting an error while I was deploying the trained model. It is just failing in between. |
I've merged changes that were released as part of v2.0.0.rc1 to address this issue. |
Nice nice, should I close this issue? |
Yeah, we can close this issue. Feel free to reopen/create a new issue if further issues arise :) |
Reference:
System Information
Describe the problem
Running SageMaker example: PCA for MNIST
If a user try to deploy a model with an instance_type that is not available,
the user won't be able to simply replace the instance_type and deploy again.
Minimal repro / logs
While running PCA for MNIST in the example project, and executing the following script:
When a user doesn't have any extra resource to launch new instance with the assigned instance_type. The user will get a
ResourceLimitExceeded
error.If the user desides to change the instance_type and execute the script again.
The the user will get this error:
From this error message,
we can see that although we are trying to set a new instance_type,
the instance_type is not really reset.
The text was updated successfully, but these errors were encountered: