From 14ddc9d8b53643789aa9f66e13aa2346dd3389ef Mon Sep 17 00:00:00 2001 From: Miyoung Choi Date: Wed, 27 Apr 2022 14:05:53 -0700 Subject: [PATCH 1/5] smdmp 1.8.1 release note --- .../smd_model_parallel_change_log.rst | 68 +++++++++++++++++-- doc/api/training/smp_versions/latest.rst | 4 +- 2 files changed, 63 insertions(+), 9 deletions(-) diff --git a/doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst b/doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst index 2e5ad2a8ac..6b58700d06 100644 --- a/doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst +++ b/doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst @@ -5,9 +5,68 @@ Release Notes New features, bug fixes, and improvements are regularly made to the SageMaker distributed model parallel library. -SageMaker Distributed Model Parallel 1.8.0 Release Notes +SageMaker Distributed Model Parallel 1.8.1 Release Notes ======================================================== +*Date: April. 23. 2022* + +**New Features** + +* Added support for more configurations of Hugging Face Transformers GPT-2 and GPT-J models + with tensor parallelism: ``scale_attn_weights``, ``scale_attn_by_inverse_layer_idx``, + ``reorder_and_upcast_attn``. To learn more about these features, please refer to + the following model configuration classes + in the *Hugging Face Transformers documentation*: + + * `transformers.GPT2Config `_ + * `transformers.GPTJConfig `_ + +* Added support for activation checkpointing of modules which pass keyword value arguments + and arbitrary structures in their forward methods. This helps support + activation checkpointing with Hugging Face Transformers models even + when tensor parallelism is not enabled. + +**Bug Fixes** + +* Fixed a correctness issue with tensor parallelism for GPT-J model + which was due to improper scaling during gradient reduction + for some layer normalization modules. +* Fixed the creation of unnecessary additional processes which take up some + GPU memory on GPU 0 when the :class:`smp.allgather` collective is called. + +**Improvements** + +* Improved activation offloading so that activations are preloaded on a + per-layer basis as opposed to all activations for a micro batch earlier. + This not only improves memory efficiency and performance, but also makes + activation offloading a useful feature for non-pipeline parallelism cases. + +**Migration to AWS Deep Learning Containers** + +This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers: + +* HuggingFace 4.17.0 DLC with PyTorch 1.10.2 + + .. code:: + + 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-training:1.10.2-transformers4.17.0-gpu-py38-cu113-ubuntu20.04 + + +* The binary file of this version of the library for custom container users + + .. code:: + + https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.10.0/build-artifacts/2022-04-14-03-58/smdistributed_modelparallel-1.8.1-cp38-cp38-linux_x86_64.whl + + +---- + +Release History +=============== + +SageMaker Distributed Model Parallel 1.8.0 Release Notes +-------------------------------------------------------- + *Date: March. 23. 2022* **New Features** @@ -32,18 +91,13 @@ This version passed benchmark testing and is migrated to the following AWS Deep 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-training:1.10.2-transformers4.17.0-gpu-py38-cu113-ubuntu20.04 - The binary file of this version of the library for custom container users: +* The binary file of this version of the library for custom container users .. code:: https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.10.0/build-artifacts/2022-03-12-00-33/smdistributed_modelparallel-1.8.0-cp38-cp38-linux_x86_64.whl ----- - -Release History -=============== - SageMaker Distributed Model Parallel 1.7.0 Release Notes -------------------------------------------------------- diff --git a/doc/api/training/smp_versions/latest.rst b/doc/api/training/smp_versions/latest.rst index 425825054b..5b8c732618 100644 --- a/doc/api/training/smp_versions/latest.rst +++ b/doc/api/training/smp_versions/latest.rst @@ -10,8 +10,8 @@ depending on which version of the library you need to use. To use the library, reference the **Common API** documentation alongside the framework specific API documentation. -Version 1.7.0, 1.8.0 (Latest) -============================= +Version 1.7.0, 1.8.0, 1.8.1 (Latest) +==================================== To use the library, reference the Common API documentation alongside the framework specific API documentation. From 7ae6e73d63c923d3c44e702d04f93a0c0882f5e8 Mon Sep 17 00:00:00 2001 From: Miyoung Choi Date: Wed, 27 Apr 2022 14:10:59 -0700 Subject: [PATCH 2/5] typo --- .../smd_model_parallel_change_log.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst b/doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst index 6b58700d06..630dad5a86 100644 --- a/doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst +++ b/doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst @@ -12,7 +12,7 @@ SageMaker Distributed Model Parallel 1.8.1 Release Notes **New Features** -* Added support for more configurations of Hugging Face Transformers GPT-2 and GPT-J models +* Added support for more configurations of the Hugging Face Transformers GPT-2 and GPT-J models with tensor parallelism: ``scale_attn_weights``, ``scale_attn_by_inverse_layer_idx``, ``reorder_and_upcast_attn``. To learn more about these features, please refer to the following model configuration classes From 0f70ddfdc6b7ca5d1f6e2a7bb80be06a082bde80 Mon Sep 17 00:00:00 2001 From: Miyoung Choi Date: Wed, 27 Apr 2022 15:35:04 -0700 Subject: [PATCH 3/5] improve intro page --- doc/api/training/smd_model_parallel.rst | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/doc/api/training/smd_model_parallel.rst b/doc/api/training/smd_model_parallel.rst index 3ad62e2abb..29b04ee03b 100644 --- a/doc/api/training/smd_model_parallel.rst +++ b/doc/api/training/smd_model_parallel.rst @@ -11,10 +11,11 @@ across multiple GPUs with minimal code changes. The library's API can be accesse .. tip:: - We recommended using this API documentation with the conceptual guide at + We recommend that you use this API documentation along with the conceptual guide at `SageMaker's Distributed Model Parallel `_ - in the *Amazon SageMaker developer guide*. This developer guide documentation includes: + in the *Amazon SageMaker developer guide*. + The conceptual guide includes the following topics: - An overview of model parallelism, and the library's `core features `_, @@ -32,10 +33,14 @@ across multiple GPUs with minimal code changes. The library's API can be accesse .. important:: - The model parallel library only supports training jobs using CUDA 11. When you define a PyTorch or TensorFlow - ``Estimator`` with ``modelparallel`` parameter ``enabled`` set to ``True``, - it uses CUDA 11. When you extend or customize your own training image - you must use a CUDA 11 base image. See - `Extend or Adapt A Docker Container that Contains the Model Parallel Library - `__ - for more information. + The model parallel library only supports SageMaker training jobs using CUDA 11. + Make sure you use the pre-built Deep Learning Containers, or use the right CUDA version + if you use a custom training container. + +.. tip:: + If you want to extend or customize your own training image + you must use a CUDA 11 base image. For more information, see `Extend a Prebuilt Docker + Container that Contains SageMaker's Distributed Model Parallel Library + `_ + and `Create Your Own Docker Container with the SageMaker Distributed Model Parallel Library + `_. From 93b98be8b0c92dd3221f61470db0c3d0982d153f Mon Sep 17 00:00:00 2001 From: Miyoung Choi Date: Wed, 27 Apr 2022 15:39:06 -0700 Subject: [PATCH 4/5] minor doc improvement --- doc/api/training/smd_model_parallel.rst | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/doc/api/training/smd_model_parallel.rst b/doc/api/training/smd_model_parallel.rst index 29b04ee03b..635dcd582d 100644 --- a/doc/api/training/smd_model_parallel.rst +++ b/doc/api/training/smd_model_parallel.rst @@ -34,11 +34,8 @@ across multiple GPUs with minimal code changes. The library's API can be accesse .. important:: The model parallel library only supports SageMaker training jobs using CUDA 11. - Make sure you use the pre-built Deep Learning Containers, or use the right CUDA version - if you use a custom training container. - -.. tip:: - If you want to extend or customize your own training image + Make sure you use the pre-built Deep Learning Containers. + If you want to extend or customize your own training image, you must use a CUDA 11 base image. For more information, see `Extend a Prebuilt Docker Container that Contains SageMaker's Distributed Model Parallel Library `_ From 6a4276869f773317fbd76386f704635e71beb248 Mon Sep 17 00:00:00 2001 From: Miyoung Choi Date: Thu, 28 Apr 2022 09:30:22 -0700 Subject: [PATCH 5/5] Trigger Build