Skip to content

Commit 1d2b364

Browse files
documentation: SageMaker model parallel library v1.10.0 documentation (#3237)
* archive doc for past versions * fix indexing * add new smp cpu memory apis * add new params * add dynamic scale params, add reference * minor fix * minor fixes * rm temp methods * add new checkpoint save/load functions, doc improvement * pass doc8 * Trigger Build * archive doc for past versions * fix indexing * add new smp cpu memory apis * add new params * add dynamic scale params, add reference * minor fix * minor fixes * rm temp methods * add new checkpoint save/load functions, doc improvement * pass doc8 * Trigger Build * remove dist word embedding option Co-authored-by: Shreya Pandit <[email protected]>
1 parent 6d24b28 commit 1d2b364

10 files changed

+2665
-153
lines changed

doc/api/training/smd_model_parallel_general.rst

+10
Original file line numberDiff line numberDiff line change
@@ -178,6 +178,16 @@ PyTorch-specific Parameters
178178
- 1
179179
- The number of devices over which the tensor parallel modules will be distributed.
180180
If ``tensor_parallel_degree`` is greater than 1, then ``ddp`` must be set to ``True``.
181+
* - ``fp16`` (**smdistributed-modelparallel**>=v1.10)
182+
- bool
183+
- ``False``
184+
- To run FP16 training, add ``"fp16"'": True`` to the smp configuration.
185+
Other APIs remain the same between FP16 and FP32.
186+
If ``fp16`` is enabled and when user calls ``smp.DistributedModel``,
187+
the model will be wrapped with ``FP16_Module``, which converts the model
188+
to FP16 dtype and deals with forward pass in FP16.
189+
If ``fp16`` is enabled and when user calls ``smp.DistributedOptimizer``,
190+
the optimizer will be wrapped with ``FP16_Optimizer``.
181191
* - ``fp16_params`` (**smdistributed-modelparallel**>=v1.6)
182192
- bool
183193
- ``False``

doc/api/training/smp_versions/archives.rst

+1
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
.. toctree::
44
:maxdepth: 1
55

6+
v1_9_0.rst
67
v1_6_0.rst
78
v1_5_0.rst
89
v1_4_0.rst

doc/api/training/smp_versions/latest.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ depending on which version of the library you need to use.
1010
To use the library, reference the
1111
**Common API** documentation alongside the framework specific API documentation.
1212

13-
Version 1.7.0, 1.8.0, 1.8.1, 1.9.0 (Latest)
13+
Version 1.10.0 (Latest)
1414
===========================================
1515

1616
To use the library, reference the Common API documentation alongside the framework specific API documentation.

doc/api/training/smp_versions/latest/smd_model_parallel_pytorch.rst

+267-67
Large diffs are not rendered by default.

doc/api/training/smp_versions/latest/smd_model_parallel_pytorch_tensor_parallel.rst

+111-85
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)