diff --git a/doc/using_tf.rst b/doc/using_tf.rst index b025f41988..c65e616bc1 100644 --- a/doc/using_tf.rst +++ b/doc/using_tf.rst @@ -48,7 +48,7 @@ The training script is very similar to a training script you might run outside o * ``SM_MODEL_DIR``: A string that represents the local path where the training job writes the model artifacts to. After training, artifacts in this directory are uploaded to S3 for model hosting. This is different than the ``model_dir`` - argument passed in your training script, which is an S3 location. ``SM_MODEL_DIR`` is always set to ``/opt/ml/model``. + argument passed in your training script, which can be an S3 location. ``SM_MODEL_DIR`` is always set to ``/opt/ml/model``. * ``SM_NUM_GPUS``: An integer representing the number of GPUs available to the host. * ``SM_OUTPUT_DATA_DIR``: A string that represents the path to the directory to write output artifacts to. Output artifacts might include checkpoints, graphs, and other files to save, but do not include model artifacts. @@ -166,7 +166,7 @@ The following args are not permitted when using Script Mode: Where the S3 url is a path to your training data within Amazon S3. The constructor keyword arguments define how SageMaker runs your training script. -For more information about the sagemaker.tensorflow.TensorFlow estimator, see `sagemaker.tensorflow.TensorFlow Class`_. +For more information about the sagemaker.tensorflow.TensorFlow estimator, see `SageMaker TensorFlow Classes`_. Call the fit Method =================== @@ -909,77 +909,11 @@ processing. There are 2 ways to do this: For more information, see: https://github.com/aws/sagemaker-tensorflow-serving-container#prepost-processing -************************************* -sagemaker.tensorflow.TensorFlow Class -************************************* +**************************** +SageMaker TensorFlow Classes +**************************** -The following are the most commonly used ``TensorFlow`` constructor arguments. - -Required: - -- ``entry_point (str)`` Path (absolute or relative) to the Python file which - should be executed as the entry point to training. -- ``role (str)`` An AWS IAM role (either name or full ARN). The Amazon - SageMaker training jobs and APIs that create Amazon SageMaker - endpoints use this role to access training data and model artifacts. - After the endpoint is created, the inference code might use the IAM - role, if accessing AWS resource. -- ``train_instance_count (int)`` Number of Amazon EC2 instances to use for - training. -- ``train_instance_type (str)`` Type of EC2 instance to use for training, for - example, 'ml.c4.xlarge'. - -Optional: - -- ``source_dir (str)`` Path (absolute or relative) to a directory with any - other training source code dependencies including the entry point - file. Structure within this directory will be preserved when training - on SageMaker. -- ``dependencies (list[str])`` A list of paths to directories (absolute or relative) with - any additional libraries that will be exported to the container (default: ``[]``). - The library folders will be copied to SageMaker in the same folder where the entrypoint is copied. - If the ``source_dir`` points to S3, code will be uploaded and the S3 location will be used - instead. Example: - - The following call - - >>> TensorFlow(entry_point='train.py', dependencies=['my/libs/common', 'virtual-env']) - - results in the following inside the container: - - >>> opt/ml/code - >>> ├── train.py - >>> ├── common - >>> └── virtual-env - -- ``hyperparameters (dict[str, ANY])`` Hyperparameters that will be used for training. - Will be made accessible as command line arguments. -- ``train_volume_size (int)`` Size in GB of the EBS volume to use for storing - input data during training. Must be large enough to the store training - data. -- ``train_max_run (int)`` Timeout in seconds for training, after which Amazon - SageMaker terminates the job regardless of its current status. -- ``output_path (str)`` S3 location where you want the training result (model - artifacts and optional output files) saved. If not specified, results - are stored to a default bucket. If the bucket with the specific name - does not exist, the estimator creates the bucket during the ``fit`` - method execution. -- ``output_kms_key`` Optional KMS key ID to optionally encrypt training - output with. -- ``base_job_name`` Name to assign for the training job that the ``fit`` - method launches. If not specified, the estimator generates a default - job name, based on the training image name and current timestamp. -- ``image_name`` An alternative docker image to use for training and - serving. If specified, the estimator will use this image for training and - hosting, instead of selecting the appropriate SageMaker official image based on - ``framework_version`` and ``py_version``. Refer to: `SageMaker TensorFlow Docker containers `_ for details on what the official images support - and where to find the source code to build your custom image. -- ``script_mode (bool)`` Whether to use Script Mode or not. Script mode is the only available training mode in Python 3, - setting ``py_version`` to ``py3`` automatically sets ``script_mode`` to True. -- ``model_dir (str)`` Location where model data, checkpoint data, and TensorBoard checkpoints should be saved during training. - If not specified a S3 location will be generated under the training job's default bucket. And ``model_dir`` will be - passed in your training script as one of the command line arguments. -- ``distributions (dict)`` Configure your distribution strategy with this argument. +For information about the different TensorFlow-related classes in the SageMaker Python SDK, see https://sagemaker.readthedocs.io/en/stable/sagemaker.tensorflow.html. ************************************** SageMaker TensorFlow Docker containers diff --git a/src/sagemaker/tensorflow/estimator.py b/src/sagemaker/tensorflow/estimator.py index eb1e2296c1..76a0dad627 100644 --- a/src/sagemaker/tensorflow/estimator.py +++ b/src/sagemaker/tensorflow/estimator.py @@ -238,9 +238,16 @@ def __init__( https://github.com/aws/sagemaker-python-sdk#tensorflow-sagemaker-estimators. If not specified, this will default to 1.11. model_dir (str): S3 location where the checkpoint data and models can be exported to - during training (default: None). If not specified a default S3 URI will be - generated. It will be passed in the training script as one of the command line - arguments. + during training (default: None). It will be passed in the training script as one of + the command line arguments. If not specified, one is provided based on + your training configuration: + + * *distributed training with MPI* - ``/opt/ml/model`` + * *single-machine training or distributed training without MPI* - \ + ``s3://{output_path}/model`` + * *Local Mode with local sources (file:// instead of s3://)* - \ + ``/opt/ml/shared/model`` + requirements_file (str): Path to a ``requirements.txt`` file (default: ''). The path should be within and relative to ``source_dir``. Details on the format can be found in the Pip User Guide: