aws
diff --git a/‎CHANGELOG.md
+85 b/‎CHANGELOG.md
+85
diff --git a/‎VERSION
+1-1 b/‎VERSION
+1-1
diff --git a/‎doc/amazon_sagemaker_featurestore.rst
+1-1 b/‎doc/amazon_sagemaker_featurestore.rst
+1-1
diff --git a/‎doc/api/training/sdp_versions/latest.rst
+9 b/‎doc/api/training/sdp_versions/latest.rst
+9
diff --git a/‎doc/api/training/sdp_versions/v1.1.0/smd_data_parallel_pytorch.rst renamed to ‎doc/api/training/sdp_versions/latest/smd_data_parallel_pytorch.rst
+2-2 b/‎doc/api/training/sdp_versions/v1.1.0/smd_data_parallel_pytorch.rst renamed to ‎doc/api/training/sdp_versions/latest/smd_data_parallel_pytorch.rst
+2-2
diff --git a/‎doc/api/training/sdp_versions/v1.1.0/smd_data_parallel_tensorflow.rst renamed to ‎doc/api/training/sdp_versions/latest/smd_data_parallel_tensorflow.rst
+7-4 b/‎doc/api/training/sdp_versions/v1.1.0/smd_data_parallel_tensorflow.rst renamed to ‎doc/api/training/sdp_versions/latest/smd_data_parallel_tensorflow.rst
+7-4
diff --git a/‎doc/api/training/sdp_versions/v1.0.0/smd_data_parallel_pytorch.rst
+6-8 b/‎doc/api/training/sdp_versions/v1.0.0/smd_data_parallel_pytorch.rst
+6-8
diff --git a/‎doc/api/training/sdp_versions/v1.0.0/smd_data_parallel_tensorflow.rst
+5-7 b/‎doc/api/training/sdp_versions/v1.0.0/smd_data_parallel_tensorflow.rst
+5-7
diff --git a/‎doc/api/training/sdp_versions/v1_1_0.rst
-9 b/‎doc/api/training/sdp_versions/v1_1_0.rst
-9
diff --git a/‎doc/api/training/smd_data_parallel.rst
+1-1 b/‎doc/api/training/smd_data_parallel.rst
+1-1
diff --git a/‎doc/api/training/smd_data_parallel_release_notes/smd_data_parallel_change_log.md
+35-4 b/‎doc/api/training/smd_data_parallel_release_notes/smd_data_parallel_change_log.md
+35-4
diff --git a/‎doc/api/training/smd_model_parallel.rst
+1-1 b/‎doc/api/training/smd_model_parallel.rst
+1-1
diff --git a/‎doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.md
+30 b/‎doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.md
+30
diff --git a/‎doc/api/training/smp_versions/v1_3_0.rst renamed to ‎doc/api/training/smp_versions/latest.rst
+3-3 b/‎doc/api/training/smp_versions/v1_3_0.rst renamed to ‎doc/api/training/smp_versions/latest.rst
+3-3
diff --git a/‎doc/api/training/smp_versions/v1.3.0/smd_model_parallel_common_api.rst renamed to ‎doc/api/training/smp_versions/latest/smd_model_parallel_common_api.rst b/‎doc/api/training/smp_versions/v1.3.0/smd_model_parallel_common_api.rst renamed to ‎doc/api/training/smp_versions/latest/smd_model_parallel_common_api.rst
diff --git a/‎doc/api/training/smp_versions/v1.3.0/smd_model_parallel_pytorch.rst renamed to ‎doc/api/training/smp_versions/latest/smd_model_parallel_pytorch.rst
+1-1 b/‎doc/api/training/smp_versions/v1.3.0/smd_model_parallel_pytorch.rst renamed to ‎doc/api/training/smp_versions/latest/smd_model_parallel_pytorch.rst
+1-1
@@ -1,5 +1,90 @@
 # Changelog
 
+## v2.38.0 (2021-04-21)
+
+### Features
+
+ * support multiprocess feature group ingest (#2111)
+
+## v2.37.0 (2021-04-20)
+
+### Features
+
+ * add experiment_config for clarify processing job
+
+### Documentation Changes
+
+ * release notes for smdistributed.dataparallel v1.1.2
+
+## v2.36.0 (2021-04-19)
+
+### Features
+
+ * enable smdataparallel custom mpi options support
+
+## v2.35.0 (2021-04-14)
+
+### Features
+
+ * add support for PyTorch 1.8.1
+
+### Bug Fixes and Other Changes
+
+ * boto3 client param updated for feature store
+ * Updated release notes and API doc for smd model parallel 1.3.1
+
+## v2.34.0 (2021-04-12)
+
+### Features
+
+ * Add support for accelerator in Clarify
+
+### Bug Fixes and Other Changes
+
+ * add Documentation for how to use
+ * enable local mode tests that were skipped
+ * add integ test for HuggingFace with TensorFlow
+
+### Documentation Changes
+
+ * release notes for smdistributed.dataparallel v1.1.1
+ * fixing the SageMaker distributed version references
+
+### Testing and Release Infrastructure
+
+ * pin version for ducutils
+
+## v2.33.0 (2021-04-05)
+
+### Features
+
+ * Add environment variable support for SageMaker training job
+
+### Bug Fixes and Other Changes
+
+ * add version length mismatch validation for HuggingFace
+ * Disable debugger when checkpointing is enabled with distributed training
+ * map user context is list associations response
+
+### Testing and Release Infrastructure
+
+ * disable_profiler on mx-horovod test
+
+## v2.32.1 (2021-04-01)
+
+### Bug Fixes and Other Changes
+
+ * disable profiler in some release tests
+ * remove outdated notebook from test
+ * add compilation option for ml_eia2
+ * add short version to smdataparallel supported list
+
+### Documentation Changes
+
+ * creating a "latest" version sm distributed docs
+ * add docs for Sagemaker Model Parallel 1.3, released with PT 1.8
+ * update PyTorch version in doc
+
 ## v2.32.0 (2021-03-26)
 
 ### Features
 
@@ -1 +1 @@
-2.32.1.dev0
+2.38.1.dev0
@@ -67,7 +67,7 @@ use the SageMaker default bucket and add a custom prefix to it.
    offline_feature_store_bucket = 's3://*{}*/*{}*'.format(default_bucket, prefix)
 
    sagemaker_client = boto_session.client(service_name='sagemaker', region_name=region)
-   featurestore_runtime = boto_session.client(service_name='featurestore-runtime', region_name=region)
+   featurestore_runtime = boto_session.client(service_name='sagemaker-featurestore-runtime', region_name=region)
 
    feature_store_session = Session(
        boto_session=boto_session,
 
@@ -0,0 +1,9 @@
+
+Version 1.1.2 (Latest)
+======================
+
+.. toctree::
+   :maxdepth: 1
+
+   latest/smd_data_parallel_pytorch.rst
+   latest/smd_data_parallel_tensorflow.rst
@@ -153,9 +153,9 @@ you will have for distributed training with the distributed data parallel librar
 PyTorch API
 ===========
 
-**Supported versions:**
+.. rubric:: Supported versions
 
--  PyTorch 1.6.0, 1.8.0
+**PyTorch 1.7.1, 1.8.0**
 
 
 .. function:: smdistributed.dataparallel.torch.distributed.is_available()
 
@@ -16,8 +16,9 @@ The following steps show you how to convert a TensorFlow 2.x training
 script to utilize the distributed data parallel library.
 
 The distributed data parallel library APIs are designed to be close to Horovod APIs.
-See `SageMaker distributed data parallel TensorFlow examples <https://sagemaker-examples.readthedocs.io/en/latest/training/distributed_training/index.html#tensorflow-distributed>`__ for additional details on how to implement the data parallel library
-API offered for TensorFlow.
+See `SageMaker distributed data parallel TensorFlow examples
+<https://sagemaker-examples.readthedocs.io/en/latest/training/distributed_training/index.html#tensorflow-distributed>`__
+for additional details on how to implement the data parallel library.
 
 -  First import the distributed data parallel library’s TensorFlow client and initialize it:
 
@@ -156,8 +157,10 @@ TensorFlow API
 
 .. rubric:: Supported versions
 
--  TensorFlow 2.x - 2.3.1
-
+TensorFlow is supported in version 1.0.0 of ``sagemakerdistributed.dataparallel``.
+Reference version 1.0.0 `TensorFlow API documentation
+<https://sagemaker.readthedocs.io/en/stable/api/training/sdp_versions/latest/smd_data_parallel_tensorflow.html#tensorflow-sdp-api>`_
+for supported TensorFlow versions.
 
 .. function:: smdistributed.dataparallel.tensorflow.init()
 
 
@@ -4,11 +4,10 @@ PyTorch Guide to SageMaker's distributed data parallel library
 
 .. admonition:: Contents
 
-   - :ref:`pytorch-sdp-modify`
-   - :ref:`pytorch-sdp-api`
+   - :ref:`pytorch-sdp-modify-1.0.0`
+   - :ref:`pytorch-sdp-api-1.0.0`
 
-.. _pytorch-sdp-modify:
-   :noindex:
+.. _pytorch-sdp-modify-1.0.0:
 
 Modify a PyTorch training script to use SageMaker data parallel
 ======================================================================
@@ -149,15 +148,14 @@ you will have for distributed training with the distributed data parallel librar
        main()
 
 
-.. _pytorch-sdp-api:
-   :noindex:
+.. _pytorch-sdp-api-1.0.0:
 
 PyTorch API
 ===========
 
-**Supported versions:**
+.. rubric:: Supported versions
 
--  PyTorch 1.6.0
+**PyTorch 1.6.0, 1.7.1**
 
 
 .. function:: smdistributed.dataparallel.torch.distributed.is_available()
 
@@ -4,11 +4,10 @@ TensorFlow Guide to SageMaker's distributed data parallel library
 
 .. admonition:: Contents
 
-   - :ref:`tensorflow-sdp-modify`
-   - :ref:`tensorflow-sdp-api`
+   - :ref:`tensorflow-sdp-modify-1.0.0`
+   - :ref:`tensorflow-sdp-api-1.0.0`
 
-.. _tensorflow-sdp-modify:
-   :noindex:
+.. _tensorflow-sdp-modify-1.0.0:
 
 Modify a TensorFlow 2.x training script to use SageMaker data parallel
 ======================================================================
@@ -150,15 +149,14 @@ script you will have for distributed training with the library.
        checkpoint.save(checkpoint_dir)
 
 
-.. _tensorflow-sdp-api:
-   :noindex:
+.. _tensorflow-sdp-api-1.0.0:
 
 TensorFlow API
 ==============
 
 .. rubric:: Supported versions
 
--  TensorFlow 2.x - 2.3.1
+**TensorFlow 2.3.x - 2.4.1**
 
 
 .. function:: smdistributed.dataparallel.tensorflow.init()
 
@@ -84,7 +84,7 @@ Select a version to see the API documentation for version.
 .. toctree::
    :maxdepth: 1
 
-   sdp_versions/v1_1_0.rst
+   sdp_versions/latest.rst
    sdp_versions/v1_0_0.rst
 
 .. important::
 
@@ -1,23 +1,54 @@
+# Sagemaker Distributed Data Parallel 1.1.2 Release Notes
+
+* Bug Fixes
+* Known Issues
+
+*Bug Fixes:*
+
+* Fixed a bug that caused some TensorFlow operations to not work with certain data types. Operations forwarded from C++ have been extended to support every dtype supported by NCCL.
+
+*Known Issues:*
+
+* SageMaker distributed data parallel has slower throughput than NCCL when run using a single node. For the best performance, use multi-node distributed training with smdistributed.dataparallel. Use a single node only for experimental runs while preparing your training pipeline.
+
+# Sagemaker Distributed Data Parallel 1.1.1 Release Notes
+
+* New Features
+* Bug Fixes
+* Known Issues
+
+*New Features:*
+
+* Adds support for PyTorch 1.8.1
+
+*Bug Fixes:*
+
+* Fixes a bug that was causing gradients from one of the worker nodes to be added twice resulting in incorrect `all_reduce` results under some conditions.
+
+*Known Issues:*
+
+* SageMaker distributed data parallel still is not efficient when run using a single node. For the best performance, use multi-node distributed training with `smdistributed.dataparallel`. Use a single node only for experimental runs while preparing your training pipeline.
+
 # Sagemaker Distributed Data Parallel 1.1.0 Release Notes
 
 * New Features
 * Bug Fixes
 * Improvements
 * Known Issues
 
-New Features:
+*New Features:*
 
 * Adds support for PyTorch 1.8.0 with CUDA 11.1 and CUDNN 8
 
-Bug Fixes:
+*Bug Fixes:*
 
 * Fixes crash issue when importing `smdataparallel` before PyTorch
 
-Improvements:
+*Improvements:*
 
 * Update `smdataparallel` name in python packages, descriptions, and log outputs
 
-Known Issues:
+*Known Issues:*
 
 * SageMaker DataParallel is not efficient when run using a single node. For the best performance, use multi-node distributed training with `smdataparallel`. Use a single node only for experimental runs while preparing your training pipeline.
 
 
@@ -34,7 +34,7 @@ Select a version to see the API documentation for version. To use the library, r
 .. toctree::
    :maxdepth: 1
 
-   smp_versions/v1_3_0.rst
+   smp_versions/latest.rst
    smp_versions/v1_2_0.rst
    smp_versions/v1_1_0.rst
 
 
@@ -1,3 +1,33 @@
+# Sagemaker Distributed Model Parallel 1.3.1 Release Notes
+
+- New Features
+- Bug Fixes
+- Known Issues
+
+## New Features
+
+### TensorFlow
+
+- Exposes a new decorator ``register_post_partition_hook``. This allows invoking the decorated methods just after model partition but before executing the first step. For example loading a checkpoint. Refer to the [SageMaker distributed model parallel API documentation](https://sagemaker.readthedocs.io/en/stable/api/training/smp_versions/latest/smd_model_parallel_tensorflow.html) for more information.
+
+## Bug Fixes
+
+### PyTorch
+
+- Improved memory efficiency when using active microbatches by clearing activations at end of each microbatch.
+
+### TensorFlow
+
+- Fixed issue that caused hangs when training some models with XLA enabled.
+
+## Known Issues
+
+### PyTorch
+
+- A crash was observed when ``optimizer.step()`` was called for certain optimizers such as AdaDelta, when the partition on which this method was called has no local parameters assigned to it after partitioning. This is due to a bug in PyTorch which [has since been fixed](https://github.com/pytorch/pytorch/pull/52944). Till that makes its way to the next release of PyTorch, only call ``optimizer.step()`` on processes which have at least one local parameter. This can be checked like this ``len(list(model.local_parameters())) > 0``.
+
+- A performance regression still exists when training on SMP with PyTorch 1.7.1 compared to 1.6. The rootcause was found to be the slowdown in performance of `.grad` method calls in PyTorch 1.7.1 compared to 1.6. See the related discussion: https://github.com/pytorch/pytorch/issues/50636. This issue does not exist with PyTorch 1.8.
+
 # Sagemaker Distributed Model Parallel 1.3.0 Release Notes
 
 - New Features
 
@@ -7,6 +7,6 @@ To use the library, reference the Common API documentation alongside the framewo
 .. toctree::
    :maxdepth: 1
 
-   v1.3.0/smd_model_parallel_common_api
-   v1.3.0/smd_model_parallel_pytorch
-   v1.3.0/smd_model_parallel_tensorflow
+   latest/smd_model_parallel_common_api
+   latest/smd_model_parallel_pytorch
+   latest/smd_model_parallel_tensorflow
@@ -6,7 +6,7 @@
 PyTorch API
 ===========
 
-**Supported versions: 1.7.1, 1.8.0**
+**Supported versions: 1.6.0, 1.7.1, 1.8.0**
 
 This API document assumes you use the following import statements in your training scripts.