You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/api/training/sdp_versions/latest/smd_data_parallel_tensorflow.rst
+7-4
Original file line number
Diff line number
Diff line change
@@ -16,8 +16,9 @@ The following steps show you how to convert a TensorFlow 2.x training
16
16
script to utilize the distributed data parallel library.
17
17
18
18
The distributed data parallel library APIs are designed to be close to Horovod APIs.
19
-
See `SageMaker distributed data parallel TensorFlow examples <https://sagemaker-examples.readthedocs.io/en/latest/training/distributed_training/index.html#tensorflow-distributed>`__ for additional details on how to implement the data parallel library
20
-
API offered for TensorFlow.
19
+
See `SageMaker distributed data parallel TensorFlow examples
Copy file name to clipboardExpand all lines: doc/api/training/smd_data_parallel_release_notes/smd_data_parallel_change_log.md
+22-4
Original file line number
Diff line number
Diff line change
@@ -1,23 +1,41 @@
1
+
# Sagemaker Distributed Data Parallel 1.1.1 Release Notes
2
+
3
+
* New Features
4
+
* Bug Fixes
5
+
* Known Issues
6
+
7
+
*New Features:*
8
+
9
+
* Adds support for PyTorch 1.8.1
10
+
11
+
*Bug Fixes:*
12
+
13
+
* Fixes a bug that was causing gradients from one of the worker nodes to be added twice resulting in incorrect `all_reduce` results under some conditions.
14
+
15
+
*Known Issues:*
16
+
17
+
* SageMaker distributed data parallel still is not efficient when run using a single node. For the best performance, use multi-node distributed training with `smdistributed.dataparallel`. Use a single node only for experimental runs while preparing your training pipeline.
18
+
1
19
# Sagemaker Distributed Data Parallel 1.1.0 Release Notes
2
20
3
21
* New Features
4
22
* Bug Fixes
5
23
* Improvements
6
24
* Known Issues
7
25
8
-
New Features:
26
+
*New Features:*
9
27
10
28
* Adds support for PyTorch 1.8.0 with CUDA 11.1 and CUDNN 8
11
29
12
-
Bug Fixes:
30
+
*Bug Fixes:*
13
31
14
32
* Fixes crash issue when importing `smdataparallel` before PyTorch
15
33
16
-
Improvements:
34
+
*Improvements:*
17
35
18
36
* Update `smdataparallel` name in python packages, descriptions, and log outputs
19
37
20
-
Known Issues:
38
+
*Known Issues:*
21
39
22
40
* SageMaker DataParallel is not efficient when run using a single node. For the best performance, use multi-node distributed training with `smdataparallel`. Use a single node only for experimental runs while preparing your training pipeline.
0 commit comments