@@ -5,9 +5,68 @@ Release Notes
5
5
New features, bug fixes, and improvements are regularly made to the SageMaker
6
6
distributed model parallel library.
7
7
8
- SageMaker Distributed Model Parallel 1.8.0 Release Notes
8
+ SageMaker Distributed Model Parallel 1.8.1 Release Notes
9
9
========================================================
10
10
11
+ *Date: April. 23. 2022 *
12
+
13
+ **New Features **
14
+
15
+ * Added support for more configurations of the Hugging Face Transformers GPT-2 and GPT-J models
16
+ with tensor parallelism: ``scale_attn_weights ``, ``scale_attn_by_inverse_layer_idx ``,
17
+ ``reorder_and_upcast_attn ``. To learn more about these features, please refer to
18
+ the following model configuration classes
19
+ in the *Hugging Face Transformers documentation *:
20
+
21
+ * `transformers.GPT2Config <https://huggingface.co/docs/transformers/model_doc/gpt2#transformers.GPT2Config >`_
22
+ * `transformers.GPTJConfig <https://huggingface.co/docs/transformers/model_doc/gptj#transformers.GPTJConfig >`_
23
+
24
+ * Added support for activation checkpointing of modules which pass keyword value arguments
25
+ and arbitrary structures in their forward methods. This helps support
26
+ activation checkpointing with Hugging Face Transformers models even
27
+ when tensor parallelism is not enabled.
28
+
29
+ **Bug Fixes **
30
+
31
+ * Fixed a correctness issue with tensor parallelism for GPT-J model
32
+ which was due to improper scaling during gradient reduction
33
+ for some layer normalization modules.
34
+ * Fixed the creation of unnecessary additional processes which take up some
35
+ GPU memory on GPU 0 when the :class: `smp.allgather ` collective is called.
36
+
37
+ **Improvements **
38
+
39
+ * Improved activation offloading so that activations are preloaded on a
40
+ per-layer basis as opposed to all activations for a micro batch earlier.
41
+ This not only improves memory efficiency and performance, but also makes
42
+ activation offloading a useful feature for non-pipeline parallelism cases.
43
+
44
+ **Migration to AWS Deep Learning Containers **
45
+
46
+ This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers:
47
+
48
+ * HuggingFace 4.17.0 DLC with PyTorch 1.10.2
49
+
50
+ .. code ::
51
+
52
+ 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-training:1.10.2-transformers4.17.0-gpu-py38-cu113-ubuntu20.04
53
+
54
+
55
+ * The binary file of this version of the library for custom container users
56
+
57
+ .. code ::
58
+
59
+ https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.10.0/build-artifacts/2022-04-14-03-58/smdistributed_modelparallel-1.8.1-cp38-cp38-linux_x86_64.whl
60
+
61
+
62
+ ----
63
+
64
+ Release History
65
+ ===============
66
+
67
+ SageMaker Distributed Model Parallel 1.8.0 Release Notes
68
+ --------------------------------------------------------
69
+
11
70
*Date: March. 23. 2022 *
12
71
13
72
**New Features **
@@ -32,18 +91,13 @@ This version passed benchmark testing and is migrated to the following AWS Deep
32
91
763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-training:1.10.2-transformers4.17.0-gpu-py38-cu113-ubuntu20.04
33
92
34
93
35
- The binary file of this version of the library for custom container users:
94
+ * The binary file of this version of the library for custom container users
36
95
37
96
.. code ::
38
97
39
98
https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.10.0/build-artifacts/2022-03-12-00-33/smdistributed_modelparallel-1.8.0-cp38-cp38-linux_x86_64.whl
40
99
41
100
42
- ----
43
-
44
- Release History
45
- ===============
46
-
47
101
SageMaker Distributed Model Parallel 1.7.0 Release Notes
48
102
--------------------------------------------------------
49
103
0 commit comments