You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you use :class:`smdistributed.dataparallel.tensorflow.allreduce()`
261
+
for non-gradient tensors,
262
+
the distributed training job might stall or stop.
249
263
250
-
``smdistributed.dataparallel`` AllReduce API can be used for all
251
-
reducing gradient tensors or any other tensors. By
252
-
default, ``smdistributed.dataparallel`` AllReduce averages the
253
-
tensors across the participating workers.
254
-
255
264
**Inputs:**
256
265
257
-
- ``tensor (tf.Tensor)(required)``: The tensor to be all-reduced. The shape of the input must be identical across all ranks.
266
+
- ``tensor (tf.Tensor)(required)``: The tensor to be allreduced. The shape of the input must be identical across all ranks.
258
267
- ``param_index (int)(required):`` 0 if you are reducing a single tensor. Index of the tensor if you are reducing a list of tensors.
259
268
- ``num_params (int)(required):`` len(tensor).
260
269
- ``compression (smdistributed.dataparallel.tensorflow.Compression)(optional)``: Compression algorithm used to reduce the amount of data sent and received by each worker node. Defaults to not using compression.
0 commit comments