You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recently, Tensorflow image started to give have an error with the final part of the training while saving the model artifact :
2018-11-18 21:51:28,519 ERROR - tf_container - Failed to download saved model. File does not exist in s3://sagemaker.../.../models_checkpoint (removed real path)
2018-11-18 21:51:28,519 ERROR - container_support.training - uncaught exception during training: 'Contents'
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/container_support/training.py", line 36, in start
fw.train()
File "/usr/local/lib/python2.7/dist-packages/tf_container/train_entry_point.py", line 177, in train
serve.export_saved_model(checkpoint_dir, env.model_dir)
File "/usr/local/lib/python2.7/dist-packages/tf_container/serve.py", line 54, in export_saved_model
raise e
KeyError: 'Contents'
I do have the argument checkpoint_path in the function call, and even added it into the parameters. Looking at the logs, the actual checkpoint is being stored in a local temp folder, like the model:
2018-11-18 21:51:22,307 WARNING - tensorflow - Using temporary folder as model directory: /tmp/tmpB7cDbM
...
2018-11-18 21:51:28,447 INFO - tensorflow - SavedModel written to: /tmp/tmpB7cDbM/export/Servo/temp-1542577888/saved_model.pb
Why did it stop storing on S3, nothing on the code indicates me why this is doing this. Why the env variable is not defined.
The text was updated successfully, but these errors were encountered:
Recently, Tensorflow image started to give have an error with the final part of the training while saving the model artifact :
I do have the argument checkpoint_path in the function call, and even added it into the parameters. Looking at the logs, the actual checkpoint is being stored in a local temp folder, like the model:
Why did it stop storing on S3, nothing on the code indicates me why this is doing this. Why the env variable is not defined.
The text was updated successfully, but these errors were encountered: