Skip to content

Commit 8095031

Browse files
committed
Merge branch 'dev' of https://github.com/aws/sagemaker-python-sdk into smddp-1.4.0-doc
1 parent ee0757d commit 8095031

File tree

6 files changed

+80
-40
lines changed

6 files changed

+80
-40
lines changed

.readthedocs.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
version: 2
66

77
python:
8-
version: 3.6
8+
version: 3.9
99
install:
1010
- method: pip
1111
path: .

doc/api/training/sdp_versions/latest/smd_data_parallel_tensorflow.rst

+37-18
Original file line numberDiff line numberDiff line change
@@ -245,16 +245,25 @@ TensorFlow API
245245

246246
.. function:: smdistributed.dataparallel.tensorflow.allreduce(tensor, param_index, num_params, compression=Compression.none, op=ReduceOp.AVERAGE)
247247

248-
Performs an all-reduce operation on a tensor (``tf.Tensor``).
248+
Performs an ``allreduce`` operation on a tensor (``tf.Tensor``).
249+
250+
The ``smdistributed.dataparallel`` package's AllReduce API for TensorFlow to allreduce
251+
gradient tensors. By default, ``smdistributed.dataparallel`` allreduce averages the
252+
gradient tensors across participating workers.
253+
254+
.. note::
255+
256+
:class:`smdistributed.dataparallel.tensorflow.allreduce()` should
257+
only be used to allreduce gradient tensors.
258+
For other (non-gradient) tensors, you must use
259+
:class:`smdistributed.dataparallel.tensorflow.oob_allreduce()`.
260+
If you use :class:`smdistributed.dataparallel.tensorflow.allreduce()`
261+
for non-gradient tensors,
262+
the distributed training job might stall or stop.
249263

250-
``smdistributed.dataparallel`` AllReduce API can be used for all
251-
reducing gradient tensors or any other tensors. By
252-
default, ``smdistributed.dataparallel`` AllReduce averages the
253-
tensors across the participating workers.
254-
255264
**Inputs:**
256265

257-
- ``tensor (tf.Tensor)(required)``: The tensor to be all-reduced. The shape of the input must be identical across all ranks.
266+
- ``tensor (tf.Tensor)(required)``: The tensor to be allreduced. The shape of the input must be identical across all ranks.
258267
- ``param_index (int)(required):`` 0 if you are reducing a single tensor. Index of the tensor if you are reducing a list of tensors.
259268
- ``num_params (int)(required):`` len(tensor).
260269
- ``compression (smdistributed.dataparallel.tensorflow.Compression)(optional)``: Compression algorithm used to reduce the amount of data sent and received by each worker node. Defaults to not using compression.
@@ -308,9 +317,9 @@ TensorFlow API
308317

309318
.. function:: smdistributed.dataparallel.tensorflow.oob_allreduce(tensor, compression=Compression.none, op=ReduceOp.AVERAGE)
310319

311-
OutOfBand (oob) AllReduce is simplified AllReduce function for use cases
320+
Out-of-band (oob) AllReduce is simplified AllReduce function for use-cases
312321
such as calculating total loss across all the GPUs in the training.
313-
oob_allreduce average the tensors, as reduction operation, across the
322+
``oob_allreduce`` average the tensors, as reduction operation, across the
314323
worker nodes.
315324

316325
**Inputs:**
@@ -328,15 +337,25 @@ TensorFlow API
328337

329338
- ``None``
330339

331-
.. rubric:: Notes
332-
333-
``smdistributed.dataparallel.tensorflow.oob_allreduce``, in most
334-
cases, is ~2x slower
335-
than ``smdistributed.dataparallel.tensorflow.allreduce``  so it is not
336-
recommended to be used for performing gradient reduction during the
337-
training
338-
process. ``smdistributed.dataparallel.tensorflow.oob_allreduce`` internally
339-
uses NCCL AllReduce with ``ncclSum`` as the reduction operation.
340+
.. note::
341+
342+
In most cases, the :class:`smdistributed.dataparallel.tensorflow.oob_allreduce()`
343+
function is ~2x slower
344+
than :class:`smdistributed.dataparallel.tensorflow.allreduce()`. It is not
345+
recommended to use the :class:`smdistributed.dataparallel.tensorflow.oob_allreduce()`
346+
function for performing gradient
347+
reduction during the training process.
348+
``smdistributed.dataparallel.tensorflow.oob_allreduce`` internally
349+
uses NCCL AllReduce with ``ncclSum`` as the reduction operation.
350+
351+
.. note::
352+
353+
:class:`smdistributed.dataparallel.tensorflow.oob_allreduce()` should
354+
only be used to allreduce non-gradient tensors.
355+
If you use :class:`smdistributed.dataparallel.tensorflow.allreduce()`
356+
for non-gradient tensors,
357+
the distributed training job might stall or stop.
358+
To allreduce gradients, use :class:`smdistributed.dataparallel.tensorflow.allreduce()`.
340359

341360

342361
.. function:: smdistributed.dataparallel.tensorflow.overlap(tensor)

doc/conf.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
1111
# ANY KIND, either express or implied. See the License for the specific
1212
# language governing permissions and limitations under the License.
13-
"""Placeholder docstring"""
13+
"""Configuration for generating readthedocs docstrings."""
1414
from __future__ import absolute_import
1515

1616
import pkg_resources

src/sagemaker/huggingface/estimator.py

+7-6
Original file line numberDiff line numberDiff line change
@@ -50,14 +50,15 @@ def __init__(
5050
compiler_config=None,
5151
**kwargs,
5252
):
53-
"""This ``Estimator`` executes a HuggingFace script in a managed execution environment.
53+
"""This estimator runs a Hugging Face training script in a SageMaker training environment.
5454
55-
The managed HuggingFace environment is an Amazon-built Docker container that executes
56-
functions defined in the supplied ``entry_point`` Python script within a SageMaker
57-
Training Job.
55+
The estimator initiates the SageMaker-managed Hugging Face environment
56+
by using the pre-built Hugging Face Docker container and runs
57+
the Hugging Face training script that user provides through
58+
the ``entry_point`` argument.
5859
59-
Training is started by calling
60-
:meth:`~sagemaker.amazon.estimator.Framework.fit` on this Estimator.
60+
After configuring the estimator class, use the class method
61+
:meth:`~sagemaker.amazon.estimator.Framework.fit()` to start a training job.
6162
6263
Args:
6364
py_version (str): Python version you want to use for executing your model training

src/sagemaker/model.py

+6-3
Original file line numberDiff line numberDiff line change
@@ -466,7 +466,7 @@ def _upload_code(self, key_prefix: str, repack: bool = False) -> None:
466466
)
467467

468468
def _script_mode_env_vars(self):
469-
"""Placeholder docstring"""
469+
"""Returns a mapping of environment variables for script mode execution"""
470470
script_name = None
471471
dir_name = None
472472
if self.uploaded_code:
@@ -478,8 +478,11 @@ def _script_mode_env_vars(self):
478478
elif self.entry_point is not None:
479479
script_name = self.entry_point
480480
if self.source_dir is not None:
481-
dir_name = "file://" + self.source_dir
482-
481+
dir_name = (
482+
self.source_dir
483+
if self.source_dir.startswith("s3://")
484+
else "file://" + self.source_dir
485+
)
483486
return {
484487
SCRIPT_PARAM_NAME.upper(): script_name or str(),
485488
DIR_PARAM_NAME.upper(): dir_name or str(),

src/sagemaker/training_compiler/config.py

+28-11
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,7 @@
1818

1919

2020
class TrainingCompilerConfig(object):
21-
"""The configuration class for accelerating SageMaker training jobs through compilation.
22-
23-
SageMaker Training Compiler speeds up training by optimizing the model execution graph.
24-
25-
"""
21+
"""The SageMaker Training Compiler configuration class."""
2622

2723
DEBUG_PATH = "/opt/ml/output/data/compiler/"
2824
SUPPORTED_INSTANCE_CLASS_PREFIXES = ["p3", "g4dn", "p4"]
@@ -37,9 +33,15 @@ def __init__(
3733
):
3834
"""This class initializes a ``TrainingCompilerConfig`` instance.
3935
40-
Pass the output of it to the ``compiler_config``
36+
`Amazon SageMaker Training Compiler
37+
<https://docs.aws.amazon.com/sagemaker/latest/dg/training-compiler.html>`_
38+
is a feature of SageMaker Training
39+
and speeds up training jobs by optimizing model execution graphs.
40+
41+
You can compile Hugging Face models
42+
by passing the object of this configuration class to the ``compiler_config``
4143
parameter of the :class:`~sagemaker.huggingface.HuggingFace`
42-
class.
44+
estimator.
4345
4446
Args:
4547
enabled (bool): Optional. Switch to enable SageMaker Training Compiler.
@@ -48,13 +50,28 @@ def __init__(
4850
This comes with a potential performance slowdown.
4951
The default is ``False``.
5052
51-
**Example**: The following example shows the basic ``compiler_config``
52-
parameter configuration, enabling compilation with default parameter values.
53+
**Example**: The following code shows the basic usage of the
54+
:class:`sagemaker.huggingface.TrainingCompilerConfig()` class
55+
to run a HuggingFace training job with the compiler.
5356
5457
.. code-block:: python
5558
56-
from sagemaker.huggingface import TrainingCompilerConfig
57-
compiler_config = TrainingCompilerConfig()
59+
from sagemaker.huggingface import HuggingFace, TrainingCompilerConfig
60+
61+
huggingface_estimator=HuggingFace(
62+
...
63+
compiler_config=TrainingCompilerConfig()
64+
)
65+
66+
.. seealso::
67+
68+
For more information about how to enable SageMaker Training Compiler
69+
for various training settings such as using TensorFlow-based models,
70+
PyTorch-based models, and distributed training,
71+
see `Enable SageMaker Training Compiler
72+
<https://docs.aws.amazon.com/sagemaker/latest/dg/training-compiler-enable.html>`_
73+
in the `Amazon SageMaker Training Compiler developer guide
74+
<https://docs.aws.amazon.com/sagemaker/latest/dg/training-compiler.html>`_.
5875
5976
"""
6077

0 commit comments

Comments
 (0)