-
Notifications
You must be signed in to change notification settings - Fork 1.2k
estimator.deploy() always uses the same directory to load models #402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@Elpiro : Did you find a solution? I am facing the same issue |
@prudhvigurram I had to use the low-level API with boto3. You can follow the documentation to use tensorflow with sagemaker here : https://docs.aws.amazon.com/sagemaker/latest/dg/tf-example1-train.html . You just need to change the training image link from tensorflow to pytorch, and the rest of the configuration is the same. Also, I used the low-level api on lambda functions and not a sagemaker notebook, so if you want to do it in a notebook I don't know how it will turn out. You don't have to worry about execution times since you're using the lambda to start training job/endpoint on sagemaker. To know when the training is done, you can set a S3 trigger with another lambda which will start the endpoint, on the bucket that receives the output of the training job. |
Hello @Elpiro, I apologize for the late response. For clarification purposes, how I am understanding this is that you provided a custom S3 URI for the 'output_path' parameter, however SageMaker isn't saving the output into that S3 URI and instead into the folder... it first created? Does that sound correct? I believe it isn't possible to customize the the s3 URI of the model you are trying to deploy through the 'deploy' function. However, it should be possible to spawn up a TensorFlowModel object with the custom s3 URI pointing to your specified model. Then you can use 'deploy' |
I ran into what I think is this same issue. Perhaps there is a reason things work this way but as far as I can tell it is a bug. If there is a reason things are being done this way I would suggest somehow alerting the user to what is happening. The issue:
If you do the above then sagemaker will not attempt to deploy your newly trained model specified with ThoughtsThe implication of this being that there does not appear to be a supported way of deploying a model with an endpoint name that you have used before and then deleted (update_endpoint will not work if the endpoint has been deleted and deploy will always look in the first location). This can lead to a lot of confusion where you are deploying a model with changes and then getting the same old results back. Code Example
|
Hi @andrewcking, I believe there are two different use cases. Let me explain both: Trying to deploy a model from a existent object:@Elpiro question was how can he deploy already created model using Python SDK. I believe he needs to an instance of the Model class, example: from sagemaker.mxnet.model import MXNetModel
sagemaker_model = MXNetModel(model_data='s3://path/to/model.tar.gz',
role='arn:aws:iam::accid:sagemaker-role',
entry_point='entry_point.py')
predictor = sagemaker_model.deploy(initial_instance_count=1,
instance_type='ml.m4.xlarge') For more information about the model class, please see: https://github.com/aws/sagemaker-python-sdk#byo-model and https://sagemaker.readthedocs.io/en/stable/sagemaker.tensorflow.html#tensorflow-model Trying to create deploy an endpoint with the same name of a previous one.In SageMaker Hosting, the process to create an endpoint requires to create an model, an endpoint configuration, and an endpoint. SageMaker Python SDK abstract these 3 layers for you. These are the lines where this process happens container_def = self.prepare_container_def(instance_type)
self.name = self.name or name_from_image(container_def['Image'])
self.sagemaker_session.create_model(self.name, self.role, container_def, vpc_config=self.vpc_config)
production_variant = sagemaker.production_variant(self.name, instance_type, initial_instance_count)
self.endpoint_name = endpoint_name or self.name
self.sagemaker_session.endpoint_from_production_variants(self.endpoint_name, [production_variant], tags) To be able to use Python SDK to create an endpoint with the same name, you will have to delete the model, the endpoint_config, and the endpoint. SageMaker Python SDK delete_endpoint() only deletes the endpoint, which is why the issue is happening. I will close this issue because it is unrelated and add one issue with bug that you found. Thanks! |
Add batch transform to image-classification notebook
I am using pytorch with the conda-pytorch36 environment provided in AWS, with a ml.p2.xlarge
Describe the problem
Sagemaker is looking for the model in the directory that was created the first time I ran the notebook. I have specified the output_path in the PyTorch estimator. But when I call deploy() on it, it will always look for the model in the folder it created the first time running it.
In the error log below, the URL should be something like "s3://sagemaker-eu-west-1-[my-user-id]/sagemaker-pytorch-2018-09-25-HH-MM-SS-mS/output/model.tar.gz" instead of "s3://sagemaker-eu-west-1-[my-user-id]/sagemaker-pytorch-2018-09-17-14-13-04-805/output/model.tar.gz"
Is there a way to explicitly tell to the deploy() function where to look for the newly created model ?
Minimal repro / logs
The text was updated successfully, but these errors were encountered: