Will tensorflow return the best model as a result of training #48

slevental · 2018-01-17T19:56:57Z

I'm trying to understand the strategy of model evaluation implemented in TF within a container, my goal is to sage the most accurate model, not the most recent one. Is it possible via this API somehow?

owen-t · 2018-01-18T17:37:19Z

The TensorFlow container saves the most recent exported model. We provide TensorFlow checkpoints, which contain older models checkpoints produced during training.

slevental · 2018-01-18T18:12:07Z

@owen-t so, right now I should manually evaluate all checkpoints, find the best one and create SageMaker model + endpoint based on that. That's something that makes sense to do in the SageMaker library, what do you think?

owen-t · 2018-01-18T18:23:33Z

Agreed - allowing better control over model export in TensorFlow would be useful.

chinazm · 2018-07-16T19:11:04Z

Using training hooks and evaluation hooks can do both best model saving and early stopping.

EstimatorSpec(
mode=spec.mode,
predictions=spec.predictions,
loss=spec.loss,
train_op=spec.train_op,
eval_metric_ops=spec.eval_metric_ops,
export_outputs=spec.export_outputs,
training_chief_hooks=spec.training_chief_hooks,
training_hooks=training_hooks,
scaffold=spec.scaffold,
evaluation_hooks=evaluation_hooks)

khu834 · 2018-09-12T00:08:25Z

Using training hooks and evaluation hooks can do both best model saving and early stopping.

EstimatorSpec(
mode=spec.mode,
predictions=spec.predictions,
loss=spec.loss,
train_op=spec.train_op,
eval_metric_ops=spec.eval_metric_ops,
export_outputs=spec.export_outputs,
training_chief_hooks=spec.training_chief_hooks,
training_hooks=training_hooks,
scaffold=spec.scaffold,
evaluation_hooks=evaluation_hooks)

Do you have more specific examples or links that better demonstrate how to utilize training and evaluation hooks? e,g., how would one construct a tf.train.SessionRunHook that can read the evaluation metrics from inside model_fn over many iterations and compare the loss

I assume the EstimatorSpec is returned by model_fn?
Where is the 'spec' object here coming from?
Thank you.

chinazm · 2018-09-18T01:06:47Z

I would refer to this feature request:
aws/sagemaker-tensorflow-training-toolkit#75

Arpin kmeans markdown

laurenyu · 2018-12-20T21:52:25Z

Closing due to inactivity and the introduction of script mode with our TensorFlow containers. Script mode allows for greater flexibility in writing the TF training script, which should allow for using the hooks described above. For more information about script mode, see our TF README.

yangaws added the type: question label Jan 17, 2018

owen-t added feature request and removed type: question labels Jan 18, 2018

laurenyu added a commit to laurenyu/sagemaker-python-sdk that referenced this issue May 31, 2018

Add delete_endpoint() method for tuner (aws#48)

535435a

apacker pushed a commit to apacker/sagemaker-python-sdk that referenced this issue Nov 15, 2018

Merge pull request aws#48 from awslabs/arpin_kmeans_markdown

d1b0932

Arpin kmeans markdown

laurenyu closed this as completed Dec 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Will tensorflow return the best model as a result of training #48

Will tensorflow return the best model as a result of training #48

slevental commented Jan 17, 2018

owen-t commented Jan 18, 2018

slevental commented Jan 18, 2018

owen-t commented Jan 18, 2018

chinazm commented Jul 16, 2018

khu834 commented Sep 12, 2018

chinazm commented Sep 18, 2018

laurenyu commented Dec 20, 2018

Will tensorflow return the best model as a result of training #48

Will tensorflow return the best model as a result of training #48

Comments

slevental commented Jan 17, 2018

owen-t commented Jan 18, 2018

slevental commented Jan 18, 2018

owen-t commented Jan 18, 2018

chinazm commented Jul 16, 2018

khu834 commented Sep 12, 2018

chinazm commented Sep 18, 2018

laurenyu commented Dec 20, 2018