Skip to content

Commit 4505229

Browse files
authored
doc: fix description of default model_dir for TF (#1368)
1 parent ff2ab43 commit 4505229

File tree

2 files changed

+16
-75
lines changed

2 files changed

+16
-75
lines changed

doc/using_tf.rst

+6-72
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ The training script is very similar to a training script you might run outside o
4848

4949
* ``SM_MODEL_DIR``: A string that represents the local path where the training job writes the model artifacts to.
5050
After training, artifacts in this directory are uploaded to S3 for model hosting. This is different than the ``model_dir``
51-
argument passed in your training script, which is an S3 location. ``SM_MODEL_DIR`` is always set to ``/opt/ml/model``.
51+
argument passed in your training script, which can be an S3 location. ``SM_MODEL_DIR`` is always set to ``/opt/ml/model``.
5252
* ``SM_NUM_GPUS``: An integer representing the number of GPUs available to the host.
5353
* ``SM_OUTPUT_DATA_DIR``: A string that represents the path to the directory to write output artifacts to.
5454
Output artifacts might include checkpoints, graphs, and other files to save, but do not include model artifacts.
@@ -166,7 +166,7 @@ The following args are not permitted when using Script Mode:
166166
Where the S3 url is a path to your training data within Amazon S3.
167167
The constructor keyword arguments define how SageMaker runs your training script.
168168

169-
For more information about the sagemaker.tensorflow.TensorFlow estimator, see `sagemaker.tensorflow.TensorFlow Class`_.
169+
For more information about the sagemaker.tensorflow.TensorFlow estimator, see `SageMaker TensorFlow Classes`_.
170170

171171
Call the fit Method
172172
===================
@@ -909,77 +909,11 @@ processing. There are 2 ways to do this:
909909
910910
For more information, see: https://github.com/aws/sagemaker-tensorflow-serving-container#prepost-processing
911911

912-
*************************************
913-
sagemaker.tensorflow.TensorFlow Class
914-
*************************************
912+
****************************
913+
SageMaker TensorFlow Classes
914+
****************************
915915

916-
The following are the most commonly used ``TensorFlow`` constructor arguments.
917-
918-
Required:
919-
920-
- ``entry_point (str)`` Path (absolute or relative) to the Python file which
921-
should be executed as the entry point to training.
922-
- ``role (str)`` An AWS IAM role (either name or full ARN). The Amazon
923-
SageMaker training jobs and APIs that create Amazon SageMaker
924-
endpoints use this role to access training data and model artifacts.
925-
After the endpoint is created, the inference code might use the IAM
926-
role, if accessing AWS resource.
927-
- ``train_instance_count (int)`` Number of Amazon EC2 instances to use for
928-
training.
929-
- ``train_instance_type (str)`` Type of EC2 instance to use for training, for
930-
example, 'ml.c4.xlarge'.
931-
932-
Optional:
933-
934-
- ``source_dir (str)`` Path (absolute or relative) to a directory with any
935-
other training source code dependencies including the entry point
936-
file. Structure within this directory will be preserved when training
937-
on SageMaker.
938-
- ``dependencies (list[str])`` A list of paths to directories (absolute or relative) with
939-
any additional libraries that will be exported to the container (default: ``[]``).
940-
The library folders will be copied to SageMaker in the same folder where the entrypoint is copied.
941-
If the ``source_dir`` points to S3, code will be uploaded and the S3 location will be used
942-
instead. Example:
943-
944-
The following call
945-
946-
>>> TensorFlow(entry_point='train.py', dependencies=['my/libs/common', 'virtual-env'])
947-
948-
results in the following inside the container:
949-
950-
>>> opt/ml/code
951-
>>> ├── train.py
952-
>>> ├── common
953-
>>> └── virtual-env
954-
955-
- ``hyperparameters (dict[str, ANY])`` Hyperparameters that will be used for training.
956-
Will be made accessible as command line arguments.
957-
- ``train_volume_size (int)`` Size in GB of the EBS volume to use for storing
958-
input data during training. Must be large enough to the store training
959-
data.
960-
- ``train_max_run (int)`` Timeout in seconds for training, after which Amazon
961-
SageMaker terminates the job regardless of its current status.
962-
- ``output_path (str)`` S3 location where you want the training result (model
963-
artifacts and optional output files) saved. If not specified, results
964-
are stored to a default bucket. If the bucket with the specific name
965-
does not exist, the estimator creates the bucket during the ``fit``
966-
method execution.
967-
- ``output_kms_key`` Optional KMS key ID to optionally encrypt training
968-
output with.
969-
- ``base_job_name`` Name to assign for the training job that the ``fit``
970-
method launches. If not specified, the estimator generates a default
971-
job name, based on the training image name and current timestamp.
972-
- ``image_name`` An alternative docker image to use for training and
973-
serving. If specified, the estimator will use this image for training and
974-
hosting, instead of selecting the appropriate SageMaker official image based on
975-
``framework_version`` and ``py_version``. Refer to: `SageMaker TensorFlow Docker containers <https://github.com/aws/sagemaker-python-sdk/tree/master/src/sagemaker/tensorflow#sagemaker-tensorflow-docker-containers>`_ for details on what the official images support
976-
and where to find the source code to build your custom image.
977-
- ``script_mode (bool)`` Whether to use Script Mode or not. Script mode is the only available training mode in Python 3,
978-
setting ``py_version`` to ``py3`` automatically sets ``script_mode`` to True.
979-
- ``model_dir (str)`` Location where model data, checkpoint data, and TensorBoard checkpoints should be saved during training.
980-
If not specified a S3 location will be generated under the training job's default bucket. And ``model_dir`` will be
981-
passed in your training script as one of the command line arguments.
982-
- ``distributions (dict)`` Configure your distribution strategy with this argument.
916+
For information about the different TensorFlow-related classes in the SageMaker Python SDK, see https://sagemaker.readthedocs.io/en/stable/sagemaker.tensorflow.html.
983917

984918
**************************************
985919
SageMaker TensorFlow Docker containers

src/sagemaker/tensorflow/estimator.py

+10-3
Original file line numberDiff line numberDiff line change
@@ -238,9 +238,16 @@ def __init__(
238238
https://github.com/aws/sagemaker-python-sdk#tensorflow-sagemaker-estimators.
239239
If not specified, this will default to 1.11.
240240
model_dir (str): S3 location where the checkpoint data and models can be exported to
241-
during training (default: None). If not specified a default S3 URI will be
242-
generated. It will be passed in the training script as one of the command line
243-
arguments.
241+
during training (default: None). It will be passed in the training script as one of
242+
the command line arguments. If not specified, one is provided based on
243+
your training configuration:
244+
245+
* *distributed training with MPI* - ``/opt/ml/model``
246+
* *single-machine training or distributed training without MPI* - \
247+
``s3://{output_path}/model``
248+
* *Local Mode with local sources (file:// instead of s3://)* - \
249+
``/opt/ml/shared/model``
250+
244251
requirements_file (str): Path to a ``requirements.txt`` file (default: ''). The path
245252
should be within and relative to ``source_dir``. Details on the format can be
246253
found in the Pip User Guide:

0 commit comments

Comments
 (0)