diff --git a/doc/api/training/sdp_versions/latest.rst b/doc/api/training/sdp_versions/latest.rst index c3fcc5f78e..461f58998f 100644 --- a/doc/api/training/sdp_versions/latest.rst +++ b/doc/api/training/sdp_versions/latest.rst @@ -26,8 +26,8 @@ depending on the version of the library you use. `_ for more information. -Version 1.4.0, 1.4.1, 1.5.0 (Latest) -==================================== +Version 1.4.0, 1.4.1, 1.5.0, 1.6.0 (Latest) +=========================================== .. toctree:: :maxdepth: 1 diff --git a/doc/api/training/smd_data_parallel_release_notes/smd_data_parallel_change_log.rst b/doc/api/training/smd_data_parallel_release_notes/smd_data_parallel_change_log.rst index 05eb7220e0..8ff7fabf1c 100644 --- a/doc/api/training/smd_data_parallel_release_notes/smd_data_parallel_change_log.rst +++ b/doc/api/training/smd_data_parallel_release_notes/smd_data_parallel_change_log.rst @@ -7,9 +7,51 @@ Release Notes New features, bug fixes, and improvements are regularly made to the SageMaker distributed data parallel library. -SageMaker Distributed Data Parallel 1.5.0 Release Notes +SageMaker Distributed Data Parallel 1.6.0 Release Notes ======================================================= +*Date: Dec. 15. 2022* + +**New Features** + +* New optimized SMDDP AllGather collective to complement the sharded data parallelism technique + in the SageMaker model parallelism library. For more information, see `Sharded data parallelism with SMDDP Collectives + `_ + in the *Amazon SageMaker Developer Guide*. +* Added support for Amazon EC2 ``ml.p4de.24xlarge`` instances. You can run data parallel training jobs + on ``ml.p4de.24xlarge`` instances with the SageMaker data parallelism library’s AllReduce collective. + +**Improvements** + +* General performance improvements of the SMDDP AllReduce collective communication operation. + +**Migration to AWS Deep Learning Containers** + +This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers (DLC): + +- SageMaker training container for PyTorch v1.12.1 + + .. code:: + + 763104351884.dkr.ecr..amazonaws.com/pytorch-training:1.12.1-gpu-py38-cu113-ubuntu20.04-sagemaker + + +Binary file of this version of the library for `custom container +`_ users: + + .. code:: + + https://smdataparallel.s3.amazonaws.com/binary/pytorch/1.12.1/cu113/2022-12-05/smdistributed_dataparallel-1.6.0-cp38-cp38-linux_x86_64.whl + + +---- + +Release History +=============== + +SageMaker Distributed Data Parallel 1.5.0 Release Notes +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + *Date: Jul. 26. 2022* **Currency Updates** @@ -38,12 +80,6 @@ Binary file of this version of the library for `custom container https://smdataparallel.s3.amazonaws.com/binary/pytorch/1.12.0/cu113/2022-07-01/smdistributed_dataparallel-1.5.0-cp38-cp38-linux_x86_64.whl - ----- - -Release History -=============== - SageMaker Distributed Data Parallel 1.4.1 Release Notes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst b/doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst index 6f89fa45a5..92ccc8c407 100644 --- a/doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst +++ b/doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst @@ -6,9 +6,60 @@ New features, bug fixes, and improvements are regularly made to the SageMaker distributed model parallel library. -SageMaker Distributed Model Parallel 1.11.0 Release Notes +SageMaker Distributed Model Parallel 1.13.0 Release Notes ========================================================= +*Date: Dec. 15. 2022* + +**New Features** + +* Sharded data parallelism now supports a new backend for collectives called *SMDDP Collectives*. + For supported scenarios, SMDDP Collectives are on by default for the AllGather operation. + For more information, see + `Sharded data parallelism with SMDDP Collectives + `_ + in the *Amazon SageMaker Developer Guide*. +* Introduced FlashAttention for DistributedTransformer to improve memory usage and computational + performance of models such as GPT2, GPTNeo, GPTJ, GPTNeoX, BERT, and RoBERTa. + +**Bug Fixes** + +* Fixed initialization of ``lm_head`` in DistributedTransformer to use a provided range + for initialization, when weights are not tied with the embeddings. + +**Improvements** + +* When a module has no parameters, we have introduced an optimization to execute + such a module on the same rank as its parent during pipeline parallelism. + +**Migration to AWS Deep Learning Containers** + +This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers (DLC): + +- SageMaker training container for PyTorch v1.12.1 + + .. code:: + + 763104351884.dkr.ecr..amazonaws.com/pytorch-training:1.12.1-gpu-py38-cu113-ubuntu20.04-sagemaker + + +Binary file of this version of the library for `custom container +`_ users: + +- For PyTorch 1.12.0 + + .. code:: + + https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.12.1/build-artifacts/2022-12-08-21-34/smdistributed_modelparallel-1.13.0-cp38-cp38-linux_x86_64.whl + +---- + +Release History +=============== + +SageMaker Distributed Model Parallel 1.11.0 Release Notes +--------------------------------------------------------- + *Date: August. 17. 2022* **New Features** @@ -41,12 +92,7 @@ Binary file of this version of the library for `custom container .. code:: - https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.12.0/build-artifacts/2022-08-12-16-58/smdistributed_modelparallel-1.11.0-cp38-cp38-linux_x86_64.whl - ----- - -Release History -=============== + https://sagemaker-distribu SageMaker Distributed Model Parallel 1.10.1 Release Notes --------------------------------------------------------- diff --git a/doc/api/training/smp_versions/latest.rst b/doc/api/training/smp_versions/latest.rst index 1a2032c9ed..1eb358b2a3 100644 --- a/doc/api/training/smp_versions/latest.rst +++ b/doc/api/training/smp_versions/latest.rst @@ -10,8 +10,8 @@ depending on which version of the library you need to use. To use the library, reference the **Common API** documentation alongside the framework specific API documentation. -Version 1.11.0 (Latest) -=========================================== +Version 1.11.0, 1.13.0 (Latest) +=============================== To use the library, reference the Common API documentation alongside the framework specific API documentation.