Skip to content

Will tensorflow return the best model as a result of training #48

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
slevental opened this issue Jan 17, 2018 · 7 comments
Closed

Will tensorflow return the best model as a result of training #48

slevental opened this issue Jan 17, 2018 · 7 comments

Comments

@slevental
Copy link

I'm trying to understand the strategy of model evaluation implemented in TF within a container, my goal is to sage the most accurate model, not the most recent one. Is it possible via this API somehow?

@owen-t
Copy link
Contributor

owen-t commented Jan 18, 2018

The TensorFlow container saves the most recent exported model. We provide TensorFlow checkpoints, which contain older models checkpoints produced during training.

@slevental
Copy link
Author

@owen-t so, right now I should manually evaluate all checkpoints, find the best one and create SageMaker model + endpoint based on that. That's something that makes sense to do in the SageMaker library, what do you think?

@owen-t
Copy link
Contributor

owen-t commented Jan 18, 2018

Agreed - allowing better control over model export in TensorFlow would be useful.

laurenyu added a commit to laurenyu/sagemaker-python-sdk that referenced this issue May 31, 2018
@chinazm
Copy link

chinazm commented Jul 16, 2018

Using training hooks and evaluation hooks can do both best model saving and early stopping.

EstimatorSpec(
mode=spec.mode,
predictions=spec.predictions,
loss=spec.loss,
train_op=spec.train_op,
eval_metric_ops=spec.eval_metric_ops,
export_outputs=spec.export_outputs,
training_chief_hooks=spec.training_chief_hooks,
training_hooks=training_hooks,
scaffold=spec.scaffold,
evaluation_hooks=evaluation_hooks)

@khu834
Copy link

khu834 commented Sep 12, 2018

Using training hooks and evaluation hooks can do both best model saving and early stopping.

EstimatorSpec(
mode=spec.mode,
predictions=spec.predictions,
loss=spec.loss,
train_op=spec.train_op,
eval_metric_ops=spec.eval_metric_ops,
export_outputs=spec.export_outputs,
training_chief_hooks=spec.training_chief_hooks,
training_hooks=training_hooks,
scaffold=spec.scaffold,
evaluation_hooks=evaluation_hooks)

Do you have more specific examples or links that better demonstrate how to utilize training and evaluation hooks? e,g., how would one construct a tf.train.SessionRunHook that can read the evaluation metrics from inside model_fn over many iterations and compare the loss

I assume the EstimatorSpec is returned by model_fn?
Where is the 'spec' object here coming from?
Thank you.

@chinazm
Copy link

chinazm commented Sep 18, 2018

I would refer to this feature request:
aws/sagemaker-tensorflow-training-toolkit#75

apacker pushed a commit to apacker/sagemaker-python-sdk that referenced this issue Nov 15, 2018
@laurenyu
Copy link
Contributor

Closing due to inactivity and the introduction of script mode with our TensorFlow containers. Script mode allows for greater flexibility in writing the TF training script, which should allow for using the hooks described above. For more information about script mode, see our TF README.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants