SissiChenxy
diff --git a/‎doc/api/training/smd_data_parallel.rst
+3-3 b/‎doc/api/training/smd_data_parallel.rst
+3-3
diff --git a/‎doc/api/training/smd_model_parallel.rst
+25-39 b/‎doc/api/training/smd_model_parallel.rst
+25-39
diff --git a/‎doc/api/training/smd_model_parallel_general.rst
+333-350 b/‎doc/api/training/smd_model_parallel_general.rst
+333-350
diff --git a/‎doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst
+69-6 b/‎doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst
+69-6
diff --git a/‎doc/api/training/smp_versions/archives.rst
+10 b/‎doc/api/training/smp_versions/archives.rst
+10
diff --git a/‎doc/api/training/smp_versions/latest.rst
+25-1 b/‎doc/api/training/smp_versions/latest.rst
+25-1
diff --git a/‎doc/api/training/smp_versions/latest/smd_model_parallel_common_api.rst
+75-25 b/‎doc/api/training/smp_versions/latest/smd_model_parallel_common_api.rst
+75-25
@@ -1,6 +1,6 @@
-##########################
-Distributed data parallel
-##########################
+###############################################
+The SageMaker Distributed Data Parallel Library
+###############################################
 
 SageMaker's distributed data parallel library extends SageMaker’s training
 capabilities on deep learning models with near-linear scaling efficiency,
 
@@ -1,5 +1,5 @@
-Distributed model parallel
---------------------------
+The SageMaker Distributed Model Parallel Library
+------------------------------------------------
 
 The Amazon SageMaker distributed model parallel library is a model parallelism library for training
 large deep learning models that were previously difficult to train due to GPU memory limitations.
@@ -9,49 +9,35 @@ allowing you to increase prediction accuracy by creating larger models with more
 You can use the library to automatically partition your existing TensorFlow and PyTorch workloads
 across multiple GPUs with minimal code changes. The library's API can be accessed through the Amazon SageMaker SDK.
 
-Use the following sections to learn more about the model parallelism and the library.
-
-Use with the SageMaker Python SDK
-=================================
-
-Use the following page to learn how to configure and enable distributed model parallel
-when you configure an Amazon SageMaker Python SDK `Estimator`.
+See the following sections to learn more about the SageMaker model parallel library APIs.
 
 .. toctree::
-   :maxdepth: 1
+   :maxdepth: 3
 
+   smp_versions/latest
    smd_model_parallel_general
 
-API Documentation
-=================
-
-The library contains a Common API that is shared across frameworks, as well as APIs
-that are specific to supported frameworks, TensorFlow and PyTorch.
-
-Select a version to see the API documentation for version. To use the library, reference the
-**Common API** documentation alongside the framework specific API documentation.
-
-.. toctree::
-   :maxdepth: 1
-
-   smp_versions/latest.rst
-   smp_versions/v1_3_0.rst
-   smp_versions/v1_2_0.rst
-   smp_versions/v1_1_0.rst
-
-It is recommended to use this documentation alongside `SageMaker Distributed Model Parallel
-<http://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel.html>`__ in the Amazon SageMaker
-developer guide. This developer guide documentation includes:
 
-   -  An overview of model parallelism and the library
-      `core features <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-core-features.html>`__
-   -  Instructions on how to modify `TensorFlow
-      <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-customize-training-script.html#model-parallel-customize-training-script-tf>`__
-      and `PyTorch
-      <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-customize-training-script.html#model-parallel-customize-training-script-pt>`__
-      training scripts
-   -  `Configuration tips and pitfalls
-      <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-customize-tips-pitfalls.html>`__
+.. tip::
+
+  We recommended using this API documentation with the conceptual guide at
+  `SageMaker's Distributed Model Parallel
+  <http://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel.html>`_
+  in the *Amazon SageMaker developer guide*. This developer guide documentation includes:
+
+  - An overview of model parallelism, and the library's
+    `core features <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-core-features.html>`_,
+    and `extended features for PyTorch <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch.html>`_.
+  - Instructions on how to modify `TensorFlow
+    <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-customize-training-script-tf.html>`_
+    and `PyTorch
+    <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-customize-training-script-pt.html>`_
+    training scripts.
+  - Instructions on how to `run a distributed training job using the SageMaker Python SDK
+    and the SageMaker model parallel library
+    <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-sm-sdk.html>`_.
+  - `Configuration tips and pitfalls
+    <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-customize-tips-pitfalls.html>`_.
 
 
 .. important::
 
@@ -1,6 +1,67 @@
-Sagemaker Distributed Model Parallel 1.4.0 Release Notes
+Sagemaker Distributed Model Parallel 1.6.0 Release Notes
 ========================================================
 
+*Date: December. 20. 2021*
+
+**New Features**
+
+- **PyTorch**
+
+  - Added extended memory-saving features for PyTorch 1.8.1:
+
+    - Tensor parallelism
+    - Optimizer state sharding
+    - Activation checkpointing
+    - Activation offloading
+
+    For more information, see the following documentation:
+
+    - `SageMaker distributed model parallel developer guide <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch.html>`_
+    - `SageMaker distributed model parallel API documentation for v1.6.0 <https://sagemaker.readthedocs.io/en/stable/api/training/smp_versions/latest.html>`_
+
+**Migration to AWS Deep Learning Containers**
+
+This version passed benchmark testing and is migrated to the following
+AWS Deep Learning Container(s):
+
+- Deep Learning Container for PyTorch 1.8.1:
+
+  .. code::
+
+    763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.8.1-gpu-py36-cu111-ubuntu18.04
+
+----
+
+Release History
+===============
+
+Sagemaker Distributed Model Parallel 1.5.0 Release Notes
+--------------------------------------------------------
+
+*Date: November. 03. 2021*
+
+**New Features**
+
+- **PyTorch**
+
+  - Currency update for PyTorch 1.10.0
+
+**Migration to AWS Deep Learning Containers**
+
+This version passed benchmark testing and is migrated to the following
+AWS Deep Learning Containers:
+
+- Deep Learning Container for PyTorch 1.10.0:
+
+  .. code::
+
+    763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.10.0-gpu-py38-cu113-ubuntu20.04-sagemaker
+
+----
+
+Sagemaker Distributed Model Parallel 1.4.0 Release Notes
+--------------------------------------------------------
+
 *Date: June. 29. 2021*
 
 **New Features**
@@ -15,17 +76,19 @@ Sagemaker Distributed Model Parallel 1.4.0 Release Notes
 This version passed benchmark testing and is migrated to the following
 AWS Deep Learning Containers:
 
-- TensorFlow 2.5.0 DLC release: `v1.0-tf-2.5.0-tr-py37
-  <https://github.com/aws/deep-learning-containers/releases/tag/v1.0-tf-2.5.0-tr-py37>`__
+- Deep Learning Container for TensorFlow 2.5.0:
 
   .. code::
 
     763104351884.dkr.ecr.<region>.amazonaws.com/tensorflow-training:2.5.0-gpu-py37-cu112-ubuntu18.04-v1.0
 
-----
+- Deep Learning Container for PyTorch 1.9.1:
 
-Release History
-===============
+  .. code::
+
+    763104351884.dkr.ecr.<region>.amazonaws.com/pytorch-training:1.9.1-gpu-py38-cu111-ubuntu20.04
+
+----
 
 Sagemaker Distributed Model Parallel 1.3.1 Release Notes
 --------------------------------------------------------
 
@@ -0,0 +1,10 @@
+.. _smdmp-pt-version-archive:
+
+.. toctree::
+    :maxdepth: 1
+
+    v1_5_0.rst
+    v1_4_0.rst
+    v1_3_0.rst
+    v1_2_0.rst
+    v1_1_0.rst
@@ -1,5 +1,16 @@
+###############################################
+Use the Library's API to Adapt Training Scripts
+###############################################
 
-Version 1.4.0 (Latest)
+The library provides Common APIs that you can use across frameworks,
+as well as framework-specific APIs for TensorFlow and PyTorch.
+
+Select the latest or one of the previous versions of the API documentation
+depending on which version of the library you need to use.
+To use the library, reference the
+**Common API** documentation alongside the framework specific API documentation.
+
+Version 1.6.0 (Latest)
 ======================
 
 To use the library, reference the Common API documentation alongside the framework specific API documentation.
@@ -9,4 +20,17 @@ To use the library, reference the Common API documentation alongside the framewo
 
    latest/smd_model_parallel_common_api
    latest/smd_model_parallel_pytorch
+   latest/smd_model_parallel_pytorch_tensor_parallel
    latest/smd_model_parallel_tensorflow
+
+To find archived API documentation for the previous versions of the library,
+see the following link:
+
+
+Documentation Archive
+=====================
+
+.. toctree::
+   :maxdepth: 1
+
+   archives
@@ -1,14 +1,16 @@
-.. admonition:: Contents
-
-   - :ref:`communication_api`
-   - :ref:`mpi_basics`
-
 Common API
 ==========
 
 The following SageMaker distribute model parallel APIs are common across all frameworks.
 
-**Important**: This API document assumes you use the following import statement in your training scripts.
+.. contents:: Table of Contents
+  :depth: 3
+  :local:
+
+The Library's Core APIs
+-----------------------
+
+This API document assumes you use the following import statement in your training scripts.
 
 **TensorFlow**
 
@@ -254,30 +256,78 @@ The following SageMaker distribute model parallel APIs are common across all fra
 .. _mpi_basics:
 
 MPI Basics
-^^^^^^^^^^
+----------
 
 The library exposes the following basic MPI primitives to its Python API:
 
--  ``smp.rank()``: The rank of the current process.
--  ``smp.size()``: The total number of processes.
--  ``smp.mp_rank()``: The rank of the process among the processes that
-   hold the current model replica.
--  ``smp.dp_rank()``: The rank of the process among the processes that
-   hold different replicas of the same model partition.
--  ``smp.dp_size()``: The total number of model replicas.
--  ``smp.local_rank()``: The rank among the processes on the current
-   instance.
--  ``smp.local_size()``: The total number of processes on the current
-   instance.
--  ``smp.get_mp_group()``: The list of ranks over which the current
-   model replica is partitioned.
--  ``smp.get_dp_group()``: The list of ranks that hold different
-   replicas of the same model partition.
-
-   .. _communication_api:
+**Global**
+
+-  ``smp.rank()`` : The global rank of the current process.
+-  ``smp.size()`` : The total number of processes.
+-  ``smp.get_world_process_group()`` :
+   ``torch.distributed.ProcessGroup`` that contains all processes.
+-  ``smp.CommGroup.WORLD``: The communication group corresponding to all processes.
+-  ``smp.local_rank()``: The rank among the processes on the current instance.
+-  ``smp.local_size()``: The total number of processes on the current instance.
+-  ``smp.get_mp_group()``: The list of ranks over which the current model replica is partitioned.
+-  ``smp.get_dp_group()``: The list of ranks that hold different replicas of the same model partition.
+
+**Tensor Parallelism**
+
+-  ``smp.tp_rank()`` : The rank of the process within its
+   tensor-parallelism group.
+-  ``smp.tp_size()`` : The size of the tensor-parallelism group.
+-  ``smp.get_tp_process_group()`` : Equivalent to
+   ``torch.distributed.ProcessGroup`` that contains the processes in the
+   current tensor-parallelism group.
+-  ``smp.CommGroup.TP_GROUP`` : The communication group corresponding to
+   the current tensor parallelism group.
+
+**Pipeline Parallelism**
+
+-  ``smp.pp_rank()`` : The rank of the process within its
+   pipeline-parallelism group.
+-  ``smp.pp_size()`` : The size of the pipeline-parallelism group.
+-  ``smp.get_pp_process_group()`` : ``torch.distributed.ProcessGroup``
+   that contains the processes in the current pipeline-parallelism group.
+-  ``smp.CommGroup.PP_GROUP`` : The communication group corresponding to
+   the current pipeline parallelism group.
+
+**Reduced-Data Parallelism**
+
+-  ``smp.rdp_rank()`` : The rank of the process within its
+   reduced-data-parallelism group.
+-  ``smp.rdp_size()`` : The size of the reduced-data-parallelism group.
+-  ``smp.get_rdp_process_group()`` : ``torch.distributed.ProcessGroup``
+   that contains the processes in the current reduced data parallelism
+   group.
+-  ``smp.CommGroup.RDP_GROUP`` : The communication group corresponding
+   to the current reduced data parallelism group.
+
+**Model Parallelism**
+
+-  ``smp.mp_rank()`` : The rank of the process within its model-parallelism
+   group.
+-  ``smp.mp_size()`` : The size of the model-parallelism group.
+-  ``smp.get_mp_process_group()`` : ``torch.distributed.ProcessGroup``
+   that contains the processes in the current model-parallelism group.
+-  ``smp.CommGroup.MP_GROUP`` : The communication group corresponding to
+   the current model parallelism group.
+
+**Data Parallelism**
+
+-  ``smp.dp_rank()`` : The rank of the process within its data-parallelism
+   group.
+-  ``smp.dp_size()`` : The size of the data-parallelism group.
+-  ``smp.get_dp_process_group()`` : ``torch.distributed.ProcessGroup``
+   that contains the processes in the current data-parallelism group.
+-  ``smp.CommGroup.DP_GROUP`` : The communication group corresponding to
+   the current data-parallelism group.
+
+.. _communication_api:
 
 Communication API
-^^^^^^^^^^^^^^^^^
+-----------------
 
 The library provides a few communication primitives which can be helpful while
 developing the training script. These primitives use the following