Skip to content

documentation: SageMaker model parallel library v1.10.0 documentation #3237

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Jul 15, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions doc/api/training/smd_model_parallel_general.rst
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,16 @@ PyTorch-specific Parameters
- 1
- The number of devices over which the tensor parallel modules will be distributed.
If ``tensor_parallel_degree`` is greater than 1, then ``ddp`` must be set to ``True``.
* - ``fp16`` (**smdistributed-modelparallel**>=v1.10)
- bool
- ``False``
- To run FP16 training, add ``"fp16"'": True`` to the smp configuration.
Other APIs remain the same between FP16 and FP32.
If ``fp16`` is enabled and when user calls ``smp.DistributedModel``,
the model will be wrapped with ``FP16_Module``, which converts the model
to FP16 dtype and deals with forward pass in FP16.
If ``fp16`` is enabled and when user calls ``smp.DistributedOptimizer``,
the optimizer will be wrapped with ``FP16_Optimizer``.
* - ``fp16_params`` (**smdistributed-modelparallel**>=v1.6)
- bool
- ``False``
Expand Down
1 change: 1 addition & 0 deletions doc/api/training/smp_versions/archives.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
.. toctree::
:maxdepth: 1

v1_9_0.rst
v1_6_0.rst
v1_5_0.rst
v1_4_0.rst
Expand Down
2 changes: 1 addition & 1 deletion doc/api/training/smp_versions/latest.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ depending on which version of the library you need to use.
To use the library, reference the
**Common API** documentation alongside the framework specific API documentation.

Version 1.7.0, 1.8.0, 1.8.1, 1.9.0 (Latest)
Version 1.10.0 (Latest)
===========================================

To use the library, reference the Common API documentation alongside the framework specific API documentation.
Expand Down
334 changes: 267 additions & 67 deletions doc/api/training/smp_versions/latest/smd_model_parallel_pytorch.rst

Large diffs are not rendered by default.

Large diffs are not rendered by default.

Loading