Skip to content

Commit c5ad53e

Browse files
mchoi8739JoseJuan98
authored andcommitted
documentation: smdistributed libraries release notes (aws#3543)
1 parent a9f53a8 commit c5ad53e

File tree

4 files changed

+100
-18
lines changed

4 files changed

+100
-18
lines changed

doc/api/training/sdp_versions/latest.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,8 @@ depending on the version of the library you use.
2626
<https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-use-api.html#data-parallel-use-python-skd-api>`_
2727
for more information.
2828

29-
Version 1.4.0, 1.4.1, 1.5.0 (Latest)
30-
====================================
29+
Version 1.4.0, 1.4.1, 1.5.0, 1.6.0 (Latest)
30+
===========================================
3131

3232
.. toctree::
3333
:maxdepth: 1

doc/api/training/smd_data_parallel_release_notes/smd_data_parallel_change_log.rst

+43-7
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,51 @@ Release Notes
77
New features, bug fixes, and improvements are regularly made to the SageMaker
88
distributed data parallel library.
99

10-
SageMaker Distributed Data Parallel 1.5.0 Release Notes
10+
SageMaker Distributed Data Parallel 1.6.0 Release Notes
1111
=======================================================
1212

13+
*Date: Dec. 15. 2022*
14+
15+
**New Features**
16+
17+
* New optimized SMDDP AllGather collective to complement the sharded data parallelism technique
18+
in the SageMaker model parallelism library. For more information, see `Sharded data parallelism with SMDDP Collectives
19+
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-sharded-data-parallelism.html#model-parallel-extended-features-pytorch-sharded-data-parallelism-smddp-collectives>`_
20+
in the *Amazon SageMaker Developer Guide*.
21+
* Added support for Amazon EC2 ``ml.p4de.24xlarge`` instances. You can run data parallel training jobs
22+
on ``ml.p4de.24xlarge`` instances with the SageMaker data parallelism library’s AllReduce collective.
23+
24+
**Improvements**
25+
26+
* General performance improvements of the SMDDP AllReduce collective communication operation.
27+
28+
**Migration to AWS Deep Learning Containers**
29+
30+
This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers (DLC):
31+
32+
- SageMaker training container for PyTorch v1.12.1
33+
34+
.. code::
35+
36+
763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.12.1-gpu-py38-cu113-ubuntu20.04-sagemaker
37+
38+
39+
Binary file of this version of the library for `custom container
40+
<https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-use-api.html#data-parallel-bring-your-own-container>`_ users:
41+
42+
.. code::
43+
44+
https://smdataparallel.s3.amazonaws.com/binary/pytorch/1.12.1/cu113/2022-12-05/smdistributed_dataparallel-1.6.0-cp38-cp38-linux_x86_64.whl
45+
46+
47+
----
48+
49+
Release History
50+
===============
51+
52+
SageMaker Distributed Data Parallel 1.5.0 Release Notes
53+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
54+
1355
*Date: Jul. 26. 2022*
1456

1557
**Currency Updates**
@@ -38,12 +80,6 @@ Binary file of this version of the library for `custom container
3880
3981
https://smdataparallel.s3.amazonaws.com/binary/pytorch/1.12.0/cu113/2022-07-01/smdistributed_dataparallel-1.5.0-cp38-cp38-linux_x86_64.whl
4082
41-
42-
----
43-
44-
Release History
45-
===============
46-
4783
SageMaker Distributed Data Parallel 1.4.1 Release Notes
4884
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4985

doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst

+53-7
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,60 @@ New features, bug fixes, and improvements are regularly made to the SageMaker
66
distributed model parallel library.
77

88

9-
SageMaker Distributed Model Parallel 1.11.0 Release Notes
9+
SageMaker Distributed Model Parallel 1.13.0 Release Notes
1010
=========================================================
1111

12+
*Date: Dec. 15. 2022*
13+
14+
**New Features**
15+
16+
* Sharded data parallelism now supports a new backend for collectives called *SMDDP Collectives*.
17+
For supported scenarios, SMDDP Collectives are on by default for the AllGather operation.
18+
For more information, see
19+
`Sharded data parallelism with SMDDP Collectives
20+
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-sharded-data-parallelism.html#model-parallel-extended-features-pytorch-sharded-data-parallelism-smddp-collectives>`_
21+
in the *Amazon SageMaker Developer Guide*.
22+
* Introduced FlashAttention for DistributedTransformer to improve memory usage and computational
23+
performance of models such as GPT2, GPTNeo, GPTJ, GPTNeoX, BERT, and RoBERTa.
24+
25+
**Bug Fixes**
26+
27+
* Fixed initialization of ``lm_head`` in DistributedTransformer to use a provided range
28+
for initialization, when weights are not tied with the embeddings.
29+
30+
**Improvements**
31+
32+
* When a module has no parameters, we have introduced an optimization to execute
33+
such a module on the same rank as its parent during pipeline parallelism.
34+
35+
**Migration to AWS Deep Learning Containers**
36+
37+
This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers (DLC):
38+
39+
- SageMaker training container for PyTorch v1.12.1
40+
41+
.. code::
42+
43+
763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.12.1-gpu-py38-cu113-ubuntu20.04-sagemaker
44+
45+
46+
Binary file of this version of the library for `custom container
47+
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-sm-sdk.html#model-parallel-bring-your-own-container>`_ users:
48+
49+
- For PyTorch 1.12.0
50+
51+
.. code::
52+
53+
https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.12.1/build-artifacts/2022-12-08-21-34/smdistributed_modelparallel-1.13.0-cp38-cp38-linux_x86_64.whl
54+
55+
----
56+
57+
Release History
58+
===============
59+
60+
SageMaker Distributed Model Parallel 1.11.0 Release Notes
61+
---------------------------------------------------------
62+
1263
*Date: August. 17. 2022*
1364

1465
**New Features**
@@ -41,12 +92,7 @@ Binary file of this version of the library for `custom container
4192

4293
.. code::
4394
44-
https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.12.0/build-artifacts/2022-08-12-16-58/smdistributed_modelparallel-1.11.0-cp38-cp38-linux_x86_64.whl
45-
46-
----
47-
48-
Release History
49-
===============
95+
https://sagemaker-distribu
5096
5197
SageMaker Distributed Model Parallel 1.10.1 Release Notes
5298
---------------------------------------------------------

doc/api/training/smp_versions/latest.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -10,8 +10,8 @@ depending on which version of the library you need to use.
1010
To use the library, reference the
1111
**Common API** documentation alongside the framework specific API documentation.
1212

13-
Version 1.11.0 (Latest)
14-
===========================================
13+
Version 1.11.0, 1.13.0 (Latest)
14+
===============================
1515

1616
To use the library, reference the Common API documentation alongside the framework specific API documentation.
1717

0 commit comments

Comments
 (0)