Skip to content

Commit bd4df9c

Browse files
authored
Merge branch 'aws:master' into master
2 parents 31d4432 + 7e2c7ab commit bd4df9c

31 files changed

+2121
-660
lines changed

doc/api/training/distributed.rst

+18-1
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,25 @@ SageMaker distributed training libraries offer both data parallel and model para
44
They combine software and hardware technologies to improve inter-GPU and inter-node communications.
55
They extend SageMaker’s training capabilities with built-in options that require only small code changes to your training scripts.
66

7+
.. _sdp_api_docs_toc:
8+
9+
The SageMaker Distributed Data Parallel Library
10+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
11+
12+
.. toctree::
13+
:maxdepth: 3
14+
15+
smd_data_parallel
16+
sdp_versions/latest
17+
smd_data_parallel_use_sm_pysdk
18+
smd_data_parallel_release_notes/smd_data_parallel_change_log
19+
20+
.. _smp_api_docs_toc:
21+
22+
The SageMaker Distributed Model Parallel Library
23+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
24+
725
.. toctree::
826
:maxdepth: 3
927

10-
smd_data_parallel
1128
smd_model_parallel
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
.. _smddp-version-archive:
2+
3+
.. toctree::
4+
:maxdepth: 1
5+
6+
v1_2_x.rst
7+
v1_1_x.rst
8+
v1_0_0.rst
+41-3
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,47 @@
1+
.. _sdp_api_docs:
12

2-
Version 1.2.x (Latest)
3+
#############################################
4+
Use the Library to Adapt Your Training Script
5+
#############################################
6+
7+
This section contains the SageMaker distributed data parallel API documentation.
8+
If you are a new user of this library, it is recommended you use this guide alongside
9+
`SageMaker's Distributed Data Parallel Library
10+
<https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel.html>`_.
11+
12+
The library provides framework-specific APIs for TensorFlow and PyTorch.
13+
14+
Select the latest or one of the previous versions of the API documentation
15+
depending on the version of the library you use.
16+
17+
.. important::
18+
19+
The distributed data parallel library supports training jobs using CUDA 11 or later.
20+
When you define a :class:`sagemaker.tensorflow.estimator.TensorFlow` or
21+
:class:`sagemaker.pytorch.estimator.PyTorch`
22+
estimator with the data parallel library enabled,
23+
SageMaker uses CUDA 11. When you extend or customize your own training image,
24+
you must use a base image with CUDA 11 or later. See
25+
`SageMaker Python SDK's distributed data parallel library APIs
26+
<https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-use-api.html#data-parallel-use-python-skd-api>`_
27+
for more information.
28+
29+
Version 1.4.0 (Latest)
330
======================
431

532
.. toctree::
633
:maxdepth: 1
734

8-
latest/smd_data_parallel_pytorch.rst
9-
latest/smd_data_parallel_tensorflow.rst
35+
latest/smd_data_parallel_pytorch
36+
latest/smd_data_parallel_tensorflow
37+
38+
Documentation Archive
39+
=====================
40+
41+
To find the API documentation for the previous versions of the library,
42+
choose one of the following:
43+
44+
.. toctree::
45+
:maxdepth: 1
46+
47+
archives

0 commit comments

Comments
 (0)