Skip to content

Commit 34cb1c7

Browse files
mchoi8739ahsan-z-khanPayton StaubshreyapanditBasil Beirouti
authored
documentation: the SageMaker distributed data parallel v1.4.0 release (#2980)
Co-authored-by: Ahsan Khan <[email protected]> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Shreya Pandit <[email protected]> Co-authored-by: Basil Beirouti <[email protected]> Co-authored-by: Mufaddal Rohawala <[email protected]> Co-authored-by: Basil Beirouti <[email protected]> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Mohamed Ali Jamaoui <[email protected]> Co-authored-by: ci <ci> Co-authored-by: Jeniya Tabassum <[email protected]> Co-authored-by: sreedes <[email protected]> Co-authored-by: Navin Soni <[email protected]> Co-authored-by: Miyoung <[email protected]> Co-authored-by: Ameen Khan <[email protected]> Co-authored-by: Zhankui Lu <[email protected]> Co-authored-by: Navin Soni <[email protected]> Co-authored-by: Xiaoguang Chen <[email protected]> Co-authored-by: Jonathan Guinegagne <[email protected]> Co-authored-by: Zhankui Lu <[email protected]> Co-authored-by: Yifei Zhu <[email protected]> Co-authored-by: Qingzi-Lan <[email protected]> Co-authored-by: Ben Crabtree <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: qidewenwhen <[email protected]> Co-authored-by: Xinghan Chen <[email protected]> Co-authored-by: Tulio Casagrande <[email protected]> Co-authored-by: HappyAmazonian <[email protected]>
1 parent a30c7b5 commit 34cb1c7

16 files changed

+1560
-622
lines changed

doc/api/training/distributed.rst

+18-1
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,25 @@ SageMaker distributed training libraries offer both data parallel and model para
44
They combine software and hardware technologies to improve inter-GPU and inter-node communications.
55
They extend SageMaker’s training capabilities with built-in options that require only small code changes to your training scripts.
66

7+
.. _sdp_api_docs_toc:
8+
9+
The SageMaker Distributed Data Parallel Library
10+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
11+
12+
.. toctree::
13+
:maxdepth: 3
14+
15+
smd_data_parallel
16+
sdp_versions/latest
17+
smd_data_parallel_use_sm_pysdk
18+
smd_data_parallel_release_notes/smd_data_parallel_change_log
19+
20+
.. _smp_api_docs_toc:
21+
22+
The SageMaker Distributed Model Parallel Library
23+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
24+
725
.. toctree::
826
:maxdepth: 3
927

10-
smd_data_parallel
1128
smd_model_parallel
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
.. _smddp-version-archive:
2+
3+
.. toctree::
4+
:maxdepth: 1
5+
6+
v1_2_x.rst
7+
v1_1_x.rst
8+
v1_0_0.rst
+41-3
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,47 @@
1+
.. _sdp_api_docs:
12

2-
Version 1.2.x (Latest)
3+
#############################################
4+
Use the Library to Adapt Your Training Script
5+
#############################################
6+
7+
This section contains the SageMaker distributed data parallel API documentation.
8+
If you are a new user of this library, it is recommended you use this guide alongside
9+
`SageMaker's Distributed Data Parallel Library
10+
<https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel.html>`_.
11+
12+
The library provides framework-specific APIs for TensorFlow and PyTorch.
13+
14+
Select the latest or one of the previous versions of the API documentation
15+
depending on the version of the library you use.
16+
17+
.. important::
18+
19+
The distributed data parallel library supports training jobs using CUDA 11 or later.
20+
When you define a :class:`sagemaker.tensorflow.estimator.TensorFlow` or
21+
:class:`sagemaker.pytorch.estimator.PyTorch`
22+
estimator with the data parallel library enabled,
23+
SageMaker uses CUDA 11. When you extend or customize your own training image,
24+
you must use a base image with CUDA 11 or later. See
25+
`SageMaker Python SDK's distributed data parallel library APIs
26+
<https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-use-api.html#data-parallel-use-python-skd-api>`_
27+
for more information.
28+
29+
Version 1.4.0 (Latest)
330
======================
431

532
.. toctree::
633
:maxdepth: 1
734

8-
latest/smd_data_parallel_pytorch.rst
9-
latest/smd_data_parallel_tensorflow.rst
35+
latest/smd_data_parallel_pytorch
36+
latest/smd_data_parallel_tensorflow
37+
38+
Documentation Archive
39+
=====================
40+
41+
To find the API documentation for the previous versions of the library,
42+
choose one of the following:
43+
44+
.. toctree::
45+
:maxdepth: 1
46+
47+
archives

0 commit comments

Comments
 (0)