Skip to content

documentation: smddp doc update #2968

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 4, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -243,16 +243,25 @@ TensorFlow API

.. function:: smdistributed.dataparallel.tensorflow.allreduce(tensor, param_index, num_params, compression=Compression.none, op=ReduceOp.AVERAGE)

Performs an all-reduce operation on a tensor (``tf.Tensor``).
Performs an ``allreduce`` operation on a tensor (``tf.Tensor``).

The ``smdistributed.dataparallel`` package's AllReduce API for TensorFlow to allreduce
gradient tensors. By default, ``smdistributed.dataparallel`` allreduce averages the
gradient tensors across participating workers.

.. note::

:class:`smdistributed.dataparallel.tensorflow.allreduce()` should
only be used to allreduce gradient tensors.
For other (non-gradient) tensors, you must use
:class:`smdistributed.dataparallel.tensorflow.oob_allreduce()`.
If you use :class:`smdistributed.dataparallel.tensorflow.allreduce()`
for non-gradient tensors,
the distributed training job might stall or stop.

``smdistributed.dataparallel`` AllReduce API can be used for all
reducing gradient tensors or any other tensors. By
default, ``smdistributed.dataparallel`` AllReduce averages the
tensors across the participating workers.
**Inputs:**

- ``tensor (tf.Tensor)(required)``: The tensor to be all-reduced. The shape of the input must be identical across all ranks.
- ``tensor (tf.Tensor)(required)``: The tensor to be allreduced. The shape of the input must be identical across all ranks.
- ``param_index (int)(required):`` 0 if you are reducing a single tensor. Index of the tensor if you are reducing a list of tensors.
- ``num_params (int)(required):`` len(tensor).
- ``compression (smdistributed.dataparallel.tensorflow.Compression)(optional)``: Compression algorithm used to reduce the amount of data sent and received by each worker node. Defaults to not using compression.
Expand Down Expand Up @@ -306,9 +315,9 @@ TensorFlow API

.. function:: smdistributed.dataparallel.tensorflow.oob_allreduce(tensor, compression=Compression.none, op=ReduceOp.AVERAGE)

OutOfBand (oob) AllReduce is simplified AllReduce function for use cases
Out-of-band (oob) AllReduce is simplified AllReduce function for use-cases
such as calculating total loss across all the GPUs in the training.
oob_allreduce average the tensors, as reduction operation, across the
``oob_allreduce`` average the tensors, as reduction operation, across the
worker nodes.

**Inputs:**
Expand All @@ -326,15 +335,25 @@ TensorFlow API

- ``None``

.. rubric:: Notes

``smdistributed.dataparallel.tensorflow.oob_allreduce``, in most
cases, is ~2x slower
than ``smdistributed.dataparallel.tensorflow.allreduce``  so it is not
recommended to be used for performing gradient reduction during the
training
process. ``smdistributed.dataparallel.tensorflow.oob_allreduce`` internally
uses NCCL AllReduce with ``ncclSum`` as the reduction operation.
.. note::

In most cases, the :class:`smdistributed.dataparallel.tensorflow.oob_allreduce()`
function is ~2x slower
than :class:`smdistributed.dataparallel.tensorflow.allreduce()`. It is not
recommended to use the :class:`smdistributed.dataparallel.tensorflow.oob_allreduce()`
function for performing gradient
reduction during the training process.
``smdistributed.dataparallel.tensorflow.oob_allreduce`` internally
uses NCCL AllReduce with ``ncclSum`` as the reduction operation.

.. note::

:class:`smdistributed.dataparallel.tensorflow.oob_allreduce()` should
only be used to allreduce non-gradient tensors.
If you use :class:`smdistributed.dataparallel.tensorflow.allreduce()`
for non-gradient tensors,
the distributed training job might stall or stop.
To allreduce gradients, use :class:`smdistributed.dataparallel.tensorflow.allreduce()`.


.. function:: smdistributed.dataparallel.tensorflow.overlap(tensor)
Expand Down