Skip to content

Unable to load model parameters trained on a server to sagemaker while creating training job #293

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Harathi123 opened this issue Jul 15, 2018 · 2 comments
Assignees

Comments

@Harathi123
Copy link

System Information
MXNet Gluon
Python 3.6
GPU
Using custom model for training and inference

Question:
Can we load model parameters that are trained somewhere else (on our server) and create a training job?
I have trained my model on our server and now i want to create training job for that model. I have customized that model to sagemaker and trying to create training job by loading the previously saved parameters. But i am getting the following error while loading them:

‘AssertionError: Parameter embedding0_weight is missing in file /opt/ml/input/data/training/encoder_.params’

Any suggestions will be helpful!

Thanks,
Harathi

@mvsusp mvsusp self-assigned this Jul 17, 2018
@mvsusp
Copy link
Contributor

mvsusp commented Jul 17, 2018

hey @Harathi123 ,

Yes, you can load the model parameters from your server without problems. You can pass in these model parameters as a channel (using File mode), as a file included with your source code, or as data that download during training.

I would need to understand more how your are loading this data, my rough interpretation of the error about gives me intuition that the weighs are not being saved/loaded in the training job properly.

I will close this ticket, given that it seems to be not a python SDK issue.

Feel free to open addition issue for other questions.

Best,

Márcio

@mvsusp mvsusp closed this as completed Jul 17, 2018
@Harathi123
Copy link
Author

Hi @mvsusp,
Thanks for getting back. I am uploading those files to S3 along with the data into data_dir and trying to load them from there as follows:

encoder.load_params(‘%s/encoder.params’ %data_dir, ctx = ctx)

Thanks,
Harathi

knakad added a commit to knakad/sagemaker-python-sdk that referenced this issue Dec 4, 2019
* change: remove unnecessary env variable for baselining jobs
* fix: stop asserting output_path env variable.
* fix: correct bug if network_config_dict is empty.
knakad added a commit that referenced this issue Dec 4, 2019
* change: remove unnecessary env variable for baselining jobs
* fix: stop asserting output_path env variable.
* fix: correct bug if network_config_dict is empty.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants