Unable to load model parameters trained on a server to sagemaker while creating training job #293

Harathi123 · 2018-07-15T05:49:03Z

System Information
MXNet Gluon
Python 3.6
GPU
Using custom model for training and inference

Question:
Can we load model parameters that are trained somewhere else (on our server) and create a training job?
I have trained my model on our server and now i want to create training job for that model. I have customized that model to sagemaker and trying to create training job by loading the previously saved parameters. But i am getting the following error while loading them:

‘AssertionError: Parameter embedding0_weight is missing in file /opt/ml/input/data/training/encoder_.params’

Any suggestions will be helpful!

Thanks,
Harathi

mvsusp · 2018-07-17T04:47:06Z

hey @Harathi123 ,

Yes, you can load the model parameters from your server without problems. You can pass in these model parameters as a channel (using File mode), as a file included with your source code, or as data that download during training.

I would need to understand more how your are loading this data, my rough interpretation of the error about gives me intuition that the weighs are not being saved/loaded in the training job properly.

I will close this ticket, given that it seems to be not a python SDK issue.

Feel free to open addition issue for other questions.

Best,

Márcio

Harathi123 · 2018-07-17T15:37:03Z

Hi @mvsusp,
Thanks for getting back. I am uploading those files to S3 along with the data into data_dir and trying to load them from there as follows:

encoder.load_params(‘%s/encoder.params’ %data_dir, ctx = ctx)

Thanks,
Harathi

* change: remove unnecessary env variable for baselining jobs * fix: stop asserting output_path env variable. * fix: correct bug if network_config_dict is empty.

mvsusp self-assigned this Jul 17, 2018

mvsusp added the type: question label Jul 17, 2018

mvsusp closed this as completed Jul 17, 2018

knakad added a commit that referenced this issue Dec 4, 2019

fix: remove unused env variable for Model Monitoring (#293)

51f1dce

* change: remove unnecessary env variable for baselining jobs * fix: stop asserting output_path env variable. * fix: correct bug if network_config_dict is empty.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unable to load model parameters trained on a server to sagemaker while creating training job #293

Unable to load model parameters trained on a server to sagemaker while creating training job #293

Harathi123 commented Jul 15, 2018

mvsusp commented Jul 17, 2018

Uh oh!

Harathi123 commented Jul 17, 2018

Uh oh!

Unable to load model parameters trained on a server to sagemaker while creating training job #293

Unable to load model parameters trained on a server to sagemaker while creating training job #293

Comments

Harathi123 commented Jul 15, 2018

mvsusp commented Jul 17, 2018

Uh oh!

Harathi123 commented Jul 17, 2018

Uh oh!