You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When deploying a packaged PyTorch model using the PyTorchModel class I can successfully deploy and call the predict function, but as soon as I use the same model and pass it to a MultiDataModel class, the deployment process goes through, but when I call predictor.predict(data=data, target_model='model.tar.gz') I get the following error:
An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (0) from model with message "Your invocation timed out while waiting for a response from container model. Review the latency metrics for each container in Amazon CloudWatch, resolve the issue, and try again.".
I'm not sure if the error is related to the 'Please provide a model_fn implementation.' error I get in cloudwatch, but the model_fn function is in actually implemented and MultiDataModel somehow doesn't load it.
To reproduce
create a sample PyTorch model, train and package it.
deploy the model using PyTorchModel: (This will successfully deploy the model and when calling predictor.predict() successfully returns the inference results.
**Note:** To directly use training job `model.tar.gz` outputs as we do here, you'll need to make sure your training job produces results that:
- Already include any required inference code in a `code/` subfolder, and\n",
- (If you're using SageMaker PyTorch containers v1.6+) have been packaged to be compatible with TorchServe.\n",
See the `enable_sm_oneclick_deploy()` and `enable_torchserve_multi_model()` functions in [src/train.py](src/train.py) for notes on this. Alternatively, you can perform the same steps after the fact - to produce a new, serving-ready `model.tar.gz` from your raw training job result."
# pay attention to code_location argument!!
estimator = SKLearn(
entry_point=TRAINING_FILE, # script to use for training job
role=role,
source_dir=SOURCE_DIR, # Location of scripts
instance_count=1,
instance_type=TRAIN_INSTANCE_TYPE,
framework_version="1.2-1", # 1.2-1 is the latest version
output_path=s3_output_path, # Where to store model artifacts
base_job_name=_job,
code_location=code_location, # This is where the .tar.gz of the source_dir will be stored
metric_definitions=[{"Name": "median-AE", "Regex": "AE-at-50th-percentile: ([0-9.]+).*$"}],
hyperparameters={"n-estimators": 100, "min-samples-leaf": 3, "model-name": location},
)
there are many ways to make you codes accessible and I bring you two of them for you :)
I hope it is useful
Describe the bug
When deploying a packaged PyTorch model using the
PyTorchModel
class I can successfully deploy and call the predict function, but as soon as I use the same model and pass it to aMultiDataModel
class, the deployment process goes through, but when I callpredictor.predict(data=data, target_model='model.tar.gz')
I get the following error:I'm not sure if the error is related to the 'Please provide a model_fn implementation.' error I get in cloudwatch, but the
model_fn
function is in actually implemented andMultiDataModel
somehow doesn't load it.To reproduce
PyTorchModel
: (This will successfully deploy the model and when callingpredictor.predict()
successfully returns the inference results.Expected behavior
MultiDataModel
should deploy and work without any errors.Screenshots or logs
This is what's included in the CloudWatch logs:
System information
A description of your system. Please provide:
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: