@@ -5,14 +5,31 @@ Release Notes
5
5
New features, bug fixes, and improvements are regularly made to the SageMaker
6
6
distributed model parallel library.
7
7
8
- SageMaker Distributed Model Parallel 1.9 .0 Release Notes
9
- ========================================================
8
+ SageMaker Distributed Model Parallel 1.10 .0 Release Notes
9
+ =========================================================
10
10
11
- *Date: May. 3 . 2022 *
11
+ *Date: July. 19 . 2022 *
12
12
13
- **Currency Updates **
13
+ **New Features **
14
14
15
- * Added support for PyTorch 1.11.0
15
+ The following new features are added for PyTorch.
16
+
17
+ * Added support for FP16 training by implementing smdistributed.modelparallel
18
+ modification of Apex FP16_Module and FP16_Optimizer. To learn more, see
19
+ `FP16 Training with Model Parallelism
20
+ <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-fp16.html> `_.
21
+ * New checkpoint APIs for CPU memory usage optimization. To learn more, see
22
+ `Checkpointing Distributed Models and Optimizer States
23
+ <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-checkpoint.html> `_.
24
+
25
+ **Improvements **
26
+
27
+ * The SageMaker distributed model parallel library manages and optimizes CPU
28
+ memory by garbage-collecting non-local parameters in general and during checkpointing.
29
+ * Changes in the `GPT-2 translate functions
30
+ <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-hugging-face.html> `_
31
+ (``smdistributed.modelparallel.torch.nn.huggingface.gpt2 ``)
32
+ to save memory by not maintaining two copies of weights at the same time.
16
33
17
34
**Migration to AWS Deep Learning Containers **
18
35
@@ -28,7 +45,7 @@ Binary file of this version of the library for custom container users:
28
45
29
46
.. code ::
30
47
31
- https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.11.0/build-artifacts/2022-04-20-17-05 /smdistributed_modelparallel-1.9 .0-cp38-cp38-linux_x86_64.whl
48
+ https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.11.0/build-artifacts/2022-07-11-19-23 /smdistributed_modelparallel-1.10 .0-cp38-cp38-linux_x86_64.whl
32
49
33
50
34
51
@@ -37,6 +54,33 @@ Binary file of this version of the library for custom container users:
37
54
Release History
38
55
===============
39
56
57
+ SageMaker Distributed Model Parallel 1.9.0 Release Notes
58
+ --------------------------------------------------------
59
+
60
+ *Date: May. 3. 2022 *
61
+
62
+ **Currency Updates **
63
+
64
+ * Added support for PyTorch 1.11.0
65
+
66
+ **Migration to AWS Deep Learning Containers **
67
+
68
+ This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers (DLC):
69
+
70
+ - PyTorch 1.11.0 DLC
71
+
72
+ .. code ::
73
+
74
+ 763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.11.0-gpu-py38-cu113-ubuntu20.04-sagemaker
75
+
76
+ Binary file of this version of the library for custom container users:
77
+
78
+ .. code ::
79
+
80
+ https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.11.0/build-artifacts/2022-04-20-17-05/smdistributed_modelparallel-1.9.0-cp38-cp38-linux_x86_64.whl
81
+
82
+
83
+
40
84
SageMaker Distributed Model Parallel 1.8.1 Release Notes
41
85
--------------------------------------------------------
42
86
0 commit comments