Description
Hello everyone, I'm very new on sagemaker and I'm facing a strange issue that I can't solve.
My goal : I have created a CNN that I would like to train, build and deploy in a MLOPS pipeline with sagemaker.
First of all, I created a notebook instance in SageMaker in wich i created a wasteClassification.ipynb and a train.py file.
The train.py file contain my neural network definition, some function to train and save it and several overwritted function : model_fn, predict_fn, input_fn. In my wasteClassification.ipynb I was able to create a PyTorch estimator, train the model, deploy the endpoint and make prediction using invoke_endpoint function without any issues.
After that, i decided to create a pipeline to automate training, building and deployment using the new sagemaker tool for that.
I have created a sagemaker studio project based on the template MLOps template for model building, training, and deployment. This template provides two gitCommit repos : modelbuild and modeldeploy. I simply modified the modelbuild repo in wich I put my train.py script in the folder "/pipelines/abalone/" and I modified the file "pipelines/abalone/pipeline.py" in which I created a pytorch estimator linked to my train.py script.
When the pipeline is lauched, I can see in the training job logs that my model is training without any issue and the final endpoint is created. But when I try to invoke the endpoint (invoke_endpoint), I have an error : An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from model with message "
Please provide a model_fn implementation."
This is strange because I did provide a model_fn implementation in my train.py file...
Do you have any idea to solve this issue ?