Skip to content

Commit ae4afc2

Browse files
mchoi8739JoseJuan98
authored andcommitted
documentation: smdmp v1.10 release note (aws#3244)
1 parent ff8a613 commit ae4afc2

File tree

1 file changed

+50
-6
lines changed

1 file changed

+50
-6
lines changed

doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst

+50-6
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,31 @@ Release Notes
55
New features, bug fixes, and improvements are regularly made to the SageMaker
66
distributed model parallel library.
77

8-
SageMaker Distributed Model Parallel 1.9.0 Release Notes
9-
========================================================
8+
SageMaker Distributed Model Parallel 1.10.0 Release Notes
9+
=========================================================
1010

11-
*Date: May. 3. 2022*
11+
*Date: July. 19. 2022*
1212

13-
**Currency Updates**
13+
**New Features**
1414

15-
* Added support for PyTorch 1.11.0
15+
The following new features are added for PyTorch.
16+
17+
* Added support for FP16 training by implementing smdistributed.modelparallel
18+
modification of Apex FP16_Module and FP16_Optimizer. To learn more, see
19+
`FP16 Training with Model Parallelism
20+
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-fp16.html>`_.
21+
* New checkpoint APIs for CPU memory usage optimization. To learn more, see
22+
`Checkpointing Distributed Models and Optimizer States
23+
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-checkpoint.html>`_.
24+
25+
**Improvements**
26+
27+
* The SageMaker distributed model parallel library manages and optimizes CPU
28+
memory by garbage-collecting non-local parameters in general and during checkpointing.
29+
* Changes in the `GPT-2 translate functions
30+
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-hugging-face.html>`_
31+
(``smdistributed.modelparallel.torch.nn.huggingface.gpt2``)
32+
to save memory by not maintaining two copies of weights at the same time.
1633

1734
**Migration to AWS Deep Learning Containers**
1835

@@ -28,7 +45,7 @@ Binary file of this version of the library for custom container users:
2845

2946
.. code::
3047
31-
https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.11.0/build-artifacts/2022-04-20-17-05/smdistributed_modelparallel-1.9.0-cp38-cp38-linux_x86_64.whl
48+
https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.11.0/build-artifacts/2022-07-11-19-23/smdistributed_modelparallel-1.10.0-cp38-cp38-linux_x86_64.whl
3249
3350
3451
@@ -37,6 +54,33 @@ Binary file of this version of the library for custom container users:
3754
Release History
3855
===============
3956

57+
SageMaker Distributed Model Parallel 1.9.0 Release Notes
58+
--------------------------------------------------------
59+
60+
*Date: May. 3. 2022*
61+
62+
**Currency Updates**
63+
64+
* Added support for PyTorch 1.11.0
65+
66+
**Migration to AWS Deep Learning Containers**
67+
68+
This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers (DLC):
69+
70+
- PyTorch 1.11.0 DLC
71+
72+
.. code::
73+
74+
763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.11.0-gpu-py38-cu113-ubuntu20.04-sagemaker
75+
76+
Binary file of this version of the library for custom container users:
77+
78+
.. code::
79+
80+
https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.11.0/build-artifacts/2022-04-20-17-05/smdistributed_modelparallel-1.9.0-cp38-cp38-linux_x86_64.whl
81+
82+
83+
4084
SageMaker Distributed Model Parallel 1.8.1 Release Notes
4185
--------------------------------------------------------
4286

0 commit comments

Comments
 (0)