Skip to content

Commit 6b5e25a

Browse files
authored
Merge branch 'master' into fix_pt_1.11_config
2 parents 740f811 + 011539a commit 6b5e25a

File tree

2 files changed

+58
-8
lines changed

2 files changed

+58
-8
lines changed

doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst

Lines changed: 50 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,31 @@ Release Notes
55
New features, bug fixes, and improvements are regularly made to the SageMaker
66
distributed model parallel library.
77

8-
SageMaker Distributed Model Parallel 1.9.0 Release Notes
9-
========================================================
8+
SageMaker Distributed Model Parallel 1.10.0 Release Notes
9+
=========================================================
1010

11-
*Date: May. 3. 2022*
11+
*Date: July. 19. 2022*
1212

13-
**Currency Updates**
13+
**New Features**
1414

15-
* Added support for PyTorch 1.11.0
15+
The following new features are added for PyTorch.
16+
17+
* Added support for FP16 training by implementing smdistributed.modelparallel
18+
modification of Apex FP16_Module and FP16_Optimizer. To learn more, see
19+
`FP16 Training with Model Parallelism
20+
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-fp16.html>`_.
21+
* New checkpoint APIs for CPU memory usage optimization. To learn more, see
22+
`Checkpointing Distributed Models and Optimizer States
23+
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-checkpoint.html>`_.
24+
25+
**Improvements**
26+
27+
* The SageMaker distributed model parallel library manages and optimizes CPU
28+
memory by garbage-collecting non-local parameters in general and during checkpointing.
29+
* Changes in the `GPT-2 translate functions
30+
<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-hugging-face.html>`_
31+
(``smdistributed.modelparallel.torch.nn.huggingface.gpt2``)
32+
to save memory by not maintaining two copies of weights at the same time.
1633

1734
**Migration to AWS Deep Learning Containers**
1835

@@ -28,7 +45,7 @@ Binary file of this version of the library for custom container users:
2845

2946
.. code::
3047
31-
https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.11.0/build-artifacts/2022-04-20-17-05/smdistributed_modelparallel-1.9.0-cp38-cp38-linux_x86_64.whl
48+
https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.11.0/build-artifacts/2022-07-11-19-23/smdistributed_modelparallel-1.10.0-cp38-cp38-linux_x86_64.whl
3249
3350
3451
@@ -37,6 +54,33 @@ Binary file of this version of the library for custom container users:
3754
Release History
3855
===============
3956

57+
SageMaker Distributed Model Parallel 1.9.0 Release Notes
58+
--------------------------------------------------------
59+
60+
*Date: May. 3. 2022*
61+
62+
**Currency Updates**
63+
64+
* Added support for PyTorch 1.11.0
65+
66+
**Migration to AWS Deep Learning Containers**
67+
68+
This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers (DLC):
69+
70+
- PyTorch 1.11.0 DLC
71+
72+
.. code::
73+
74+
763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.11.0-gpu-py38-cu113-ubuntu20.04-sagemaker
75+
76+
Binary file of this version of the library for custom container users:
77+
78+
.. code::
79+
80+
https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.11.0/build-artifacts/2022-04-20-17-05/smdistributed_modelparallel-1.9.0-cp38-cp38-linux_x86_64.whl
81+
82+
83+
4084
SageMaker Distributed Model Parallel 1.8.1 Release Notes
4185
--------------------------------------------------------
4286

src/sagemaker/inputs.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -67,10 +67,16 @@ def __init__(
6767
AugmentedManifestFile formats are described at `S3DataSource
6868
<https://docs.aws.amazon.com/sagemaker/latest/dg/API_S3DataSource.html>`_
6969
in the `Amazon SageMaker API reference`.
70-
instance_groups (list[str]): Optional. A list of ``instance_group_name``\ s
71-
of a heterogeneous cluster that's configured using the
70+
instance_groups (list[str]): Optional. A list of instance group names in string format
71+
that you specified while configuring a heterogeneous cluster using the
7272
:class:`sagemaker.instance_group.InstanceGroup`.
7373
S3 data will be sent to all instance groups in the specified list.
74+
For instructions on how to use InstanceGroup objects
75+
to configure a heterogeneous cluster
76+
through the SageMaker generic and framework estimator classes, see
77+
`Train Using a Heterogeneous Cluster
78+
<https://docs.aws.amazon.com/sagemaker/latest/dg/train-heterogeneous-cluster.html>`_
79+
in the *Amazon SageMaker developer guide*.
7480
(default: None)
7581
input_mode (str): Optional override for this channel's input mode (default: None).
7682
By default, channels will use the input mode defined on

0 commit comments

Comments
 (0)