You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you use :class:`smdistributed.dataparallel.tensorflow.allreduce()`
259
+
for non-gradient tensors,
260
+
the distributed training job might stall or stop.
247
261
248
-
``smdistributed.dataparallel`` AllReduce API can be used for all
249
-
reducing gradient tensors or any other tensors. By
250
-
default, ``smdistributed.dataparallel`` AllReduce averages the
251
-
tensors across the participating workers.
252
-
253
262
**Inputs:**
254
263
255
-
- ``tensor (tf.Tensor)(required)``: The tensor to be all-reduced. The shape of the input must be identical across all ranks.
264
+
- ``tensor (tf.Tensor)(required)``: The tensor to be allreduced. The shape of the input must be identical across all ranks.
256
265
- ``param_index (int)(required):`` 0 if you are reducing a single tensor. Index of the tensor if you are reducing a list of tensors.
257
266
- ``num_params (int)(required):`` len(tensor).
258
267
- ``compression (smdistributed.dataparallel.tensorflow.Compression)(optional)``: Compression algorithm used to reduce the amount of data sent and received by each worker node. Defaults to not using compression.
0 commit comments