aws
diff --git a/‎.gitignore
Lines changed: 1 addition & 0 deletions b/‎.gitignore
Lines changed: 1 addition & 0 deletions
diff --git a/‎CHANGELOG.md
Lines changed: 25 additions & 0 deletions b/‎CHANGELOG.md
Lines changed: 25 additions & 0 deletions
diff --git a/‎VERSION
Lines changed: 1 addition & 1 deletion b/‎VERSION
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/api/training/sdp_versions/latest.rst
Lines changed: 1 addition & 1 deletion b/‎doc/api/training/sdp_versions/latest.rst
Lines changed: 1 addition & 1 deletion
diff --git a/‎doc/api/training/smd_data_parallel_release_notes/smd_data_parallel_change_log.rst
Lines changed: 32 additions & 7 deletions b/‎doc/api/training/smd_data_parallel_release_notes/smd_data_parallel_change_log.rst
Lines changed: 32 additions & 7 deletions
diff --git a/‎doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst
Lines changed: 78 additions & 6 deletions b/‎doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst
Lines changed: 78 additions & 6 deletions
diff --git a/‎doc/api/training/smp_versions/latest.rst
Lines changed: 2 additions & 2 deletions b/‎doc/api/training/smp_versions/latest.rst
Lines changed: 2 additions & 2 deletions
diff --git a/‎doc/api/training/smp_versions/latest/smd_model_parallel_pytorch_tensor_parallel.rst
Lines changed: 14 additions & 0 deletions b/‎doc/api/training/smp_versions/latest/smd_model_parallel_pytorch_tensor_parallel.rst
Lines changed: 14 additions & 0 deletions
diff --git a/‎doc/experiments/sagemaker.experiments.rst
Lines changed: 9 additions & 0 deletions b/‎doc/experiments/sagemaker.experiments.rst
Lines changed: 9 additions & 0 deletions
diff --git a/‎doc/frameworks/djl/using_djl.rst
Lines changed: 6 additions & 6 deletions b/‎doc/frameworks/djl/using_djl.rst
Lines changed: 6 additions & 6 deletions
diff --git a/‎doc/frameworks/pytorch/using_pytorch.rst
Lines changed: 1 addition & 1 deletion b/‎doc/frameworks/pytorch/using_pytorch.rst
Lines changed: 1 addition & 1 deletion
diff --git a/‎requirements/extras/test_requirements.txt
Lines changed: 1 addition & 0 deletions b/‎requirements/extras/test_requirements.txt
Lines changed: 1 addition & 0 deletions
diff --git a/‎src/sagemaker/clarify.py
Lines changed: 4 additions & 2 deletions b/‎src/sagemaker/clarify.py
Lines changed: 4 additions & 2 deletions
@@ -30,6 +30,7 @@ env/
 .vscode/
 **/tmp
 .python-version
+*.html
 **/_repack_script_launcher.sh
 tests/data/**/_repack_model.py
 tests/data/experiment/sagemaker-dev-1.0.tar.gz
@@ -1,5 +1,30 @@
 # Changelog
 
+## v2.152.0 (2023-05-04)
+
+### Features
+
+ * add support for lineage visualization using pyvis
+ * Expose Experiment class publicly
+ * PyTorch 1.13 release
+
+### Bug Fixes and Other Changes
+
+ * Change data_type argument to dtype to keep consistent with D…
+ * Skip edge test
+ * make RemoteExecutor context manager non-blocking on pending futures
+ * Add inferentia2 DLC images for djl framework
+ * Fix typo in using_pytorch.rst
+ * Unable to attach estimator to training job when KeepAlivePeriodInSeconds specified
+ * update LMI container image
+ * Update Clarify SHAPConfig baseline to allow JSON structures
+
+### Documentation Changes
+
+ * Fix broken link in DJL SageMaker docs
+ * currency update for the SageMaker data parallelism lib
+ * SM model parallel library v1.15.0 release note
+
 ## v2.151.0 (2023-04-27)
 
 ### Features
 
@@ -1 +1 @@
-2.151.1.dev0
+2.152.1.dev0
@@ -26,7 +26,7 @@ depending on the version of the library you use.
    <https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-use-api.html#data-parallel-use-python-skd-api>`_
    for more information.
 
-For versions between 1.4.0 and 1.7.0 (Latest)
+For versions between 1.4.0 and 1.8.0 (Latest)
 =============================================
 
 .. toctree::
 
@@ -5,39 +5,64 @@ Release Notes
 #############
 
 New features, bug fixes, and improvements are regularly made to the SageMaker
-distributed data parallel library.
+data parallelism library.
 
-SageMaker Distributed Data Parallel 1.7.0 Release Notes
+SageMaker Distributed Data Parallel 1.8.0 Release Notes
 =======================================================
 
-*Date: Feb. 10. 2023*
+*Date: Apr. 17. 2023*
 
 **Currency Updates**
 
-* Added support for PyTorch 1.13.1.
+* Added support for PyTorch 2.0.0.
 
 **Migration to AWS Deep Learning Containers**
 
 This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers (DLC):
 
-- PyTorch 1.13.1 DLC
+- PyTorch 2.0.0 DLC
 
   .. code::
 
-    763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.13.1-gpu-py39-cu117-ubuntu20.04-sagemaker
+    763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.0.0-gpu-py310-cu118-ubuntu20.04-sagemaker
 
 Binary file of this version of the library for custom container users:
 
   .. code::
 
-    https://smdataparallel.s3.amazonaws.com/binary/pytorch/1.13.1/cu117/2023-01-09/smdistributed_dataparallel-1.7.0-cp39-cp39-linux_x86_64.whl
+    https://smdataparallel.s3.amazonaws.com/binary/pytorch/2.0.0/cu118/2023-03-20/smdistributed_dataparallel-1.8.0-cp310-cp310-linux_x86_64.whl
 
 
 ----
 
 Release History
 ===============
 
+SageMaker Distributed Data Parallel 1.7.0 Release Notes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*Date: Feb. 10. 2023*
+
+**Currency Updates**
+
+* Added support for PyTorch 1.13.1.
+
+**Migration to AWS Deep Learning Containers**
+
+This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers (DLC):
+
+- PyTorch 1.13.1 DLC
+
+  .. code::
+
+    763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.13.1-gpu-py39-cu117-ubuntu20.04-sagemaker
+
+Binary file of this version of the library for custom container users:
+
+  .. code::
+
+    https://smdataparallel.s3.amazonaws.com/binary/pytorch/1.13.1/cu117/2023-01-09/smdistributed_dataparallel-1.7.0-cp39-cp39-linux_x86_64.whl
+
 SageMaker Distributed Data Parallel 1.6.0 Release Notes
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 
@@ -3,12 +3,88 @@ Release Notes
 #############
 
 New features, bug fixes, and improvements are regularly made to the SageMaker
-distributed model parallel library.
+model parallelism library.
 
 
-SageMaker Distributed Model Parallel 1.14.0 Release Notes
+SageMaker Distributed Model Parallel 1.15.0 Release Notes
 =========================================================
 
+*Date: Apr. 27. 2023*
+
+**Currency Updates**
+
+* Added support for PyTorch v2.0.0.
+  Note that the library does not support ``torch.compile`` in this release.
+
+**New Features**
+
+* Using sharded data parallelism with tensor parallelism together is now
+  available for PyTorch 1.13.1. It allows you to train with smaller global batch
+  sizes while scaling up to large clusters. For more information, see `Sharded
+  data parallelism with tensor parallelism <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-sharded-data-parallelism.html#model-parallel-extended-features-pytorch-sharded-data-parallelism-with-tensor-parallelism>`_
+  in the *Amazon SageMaker Developer Guide*.
+* Added support for saving and loading full model checkpoints when using sharded
+  data parallelism. This is enabled by using the standard checkpointing API,
+  ``smp.save_checkpoint`` with ``partial=False``.
+  Before, full checkpoints needed to be created by merging partial checkpoint
+  files after training finishes.
+* `DistributedTransformer <https://sagemaker.readthedocs.io/en/stable/api/training/smp_versions/latest/smd_model_parallel_pytorch_tensor_parallel.html#smdistributed.modelparallel.torch.nn.DistributedTransformerLayer>`_
+  now supports the ALiBi position embeddings.
+  When using DistributedTransformer, you can set the ``use_alibi`` parameter
+  to ``True`` to use the Triton-based flash attention kernels. This helps
+  evaluate sequences longer than those used for training.
+
+**Bug Fixes**
+
+* When using tensor parallelism, parameters were initialized multiple times
+  unncessarily. This release fixed the multiple initialization of parameters
+  so that each parameter is initialized exactly once.
+  It not only saves time, but also ensures that the random generator behavior
+  is similar to the non-tensor parallelism case.
+
+**Known issues**
+
+* Model initialization might take longer with PyTorch 2.0 than that with PyTorch 1.13.
+
+**Migration to AWS Deep Learning Containers**
+
+This version passed benchmark testing and is migrated to the following AWS Deep Learning Containers (DLC):
+
+- SageMaker training container for PyTorch v2.0.0
+
+  .. code::
+
+    763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.0.0-gpu-py310-cu118-ubuntu20.04-sagemaker
+
+- SageMaker training container for PyTorch v1.13.1
+
+  .. code::
+
+    763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.13.1-gpu-py39-cu117-ubuntu20.04-sagemaker
+
+Binary file of this version of the library for `custom container
+<https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-sm-sdk.html#model-parallel-bring-your-own-container>`_ users:
+
+- For PyTorch v2.0.0
+
+  .. code::
+
+    https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-2.0.0/build-artifacts/2023-04-14-20-14/smdistributed_modelparallel-1.15.0-cp310-cp310-linux_x86_64.whl
+
+- For PyTorch v1.13.1
+
+  .. code::
+
+    https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.13.1/build-artifacts/2023-04-17-15-49/smdistributed_modelparallel-1.15.0-cp39-cp39-linux_x86_64.whl
+
+----
+
+Release History
+===============
+
+SageMaker Distributed Model Parallel 1.14.0 Release Notes
+---------------------------------------------------------
+
 *Date: Jan. 30. 2023*
 
 **Currency Updates**
@@ -39,10 +115,6 @@ Binary file of this version of the library for `custom container
 
     https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-1.13.1/build-artifacts/2023-01-19-18-35/smdistributed_modelparallel-1.14.0-cp39-cp39-linux_x86_64.whl
 
-----
-
-Release History
-===============
 
 SageMaker Distributed Model Parallel 1.13.0 Release Notes
 ---------------------------------------------------------
 
@@ -10,8 +10,8 @@ depending on which version of the library you need to use.
 To use the library, reference the
 **Common API** documentation alongside the framework specific API documentation.
 
-Version 1.11.0, 1.13.0, 1.14.0 (Latest)
-=======================================
+Version 1.11.0, 1.13.0, 1.14.0, 1.15.0 (Latest)
+===============================================
 
 To use the library, reference the Common API documentation alongside the framework specific API documentation.
 
 
@@ -302,6 +302,20 @@ Tensor Parallelism Module APIs
       -  ``post_layernorm``: If ``True``, inserts layer normalization at
          the output. At least one of ``pre_layernorm`` and
          ``post_layernorm`` must be ``True``.
+      -  ``use_alibi`` (bool, default False): Activates Attention with
+         Linear Biases (ALiBi) for attention computation.
+         ALiBi facilitates efficient extrapolation on input sequences
+         and thus improves training efficiency.
+         The library enables ALiBi by using the `Triton
+         flash attention kernel
+         <https://github.com/HazyResearch/flash-attention>`_.
+         Refer to https://arxiv.org/abs/2108.12409 for more
+         details on the technique.
+         (Available from
+         the SageMaker model parallelism library v1.15.0.)
+      -  ``alibi_bias_max`` (int, default 8): Defines the ALiBi base
+         value for mask generation. (Available from
+         the SageMaker model parallelism library v1.15.0.)
 
    -  **Methods:**
 
 
@@ -11,6 +11,15 @@ Run
 
 .. automethod:: sagemaker.experiments.list_runs
 
+Experiment
+-------------
+
+.. autoclass:: sagemaker.experiments.Experiment
+    :members:
+
+Other
+-------------
+
 .. autoclass:: sagemaker.experiments.SortByType
     :members:
     :undoc-members:
 
@@ -31,7 +31,7 @@ You can either deploy your model using DeepSpeed or HuggingFace Accelerate, or l
     djl_model = DJLModel(
         "s3://my_bucket/my_saved_model_artifacts/", # This can also be a HuggingFace Hub model id
         "my_sagemaker_role",
-        data_type="fp16",
+        dtype="fp16",
         task="text-generation",
         number_of_partitions=2 # number of gpus to partition the model across
     )
@@ -48,7 +48,7 @@ If you want to use a specific backend, then you can create an instance of the co
     deepspeed_model = DeepSpeedModel(
         "s3://my_bucket/my_saved_model_artifacts/", # This can also be a HuggingFace Hub model id
         "my_sagemaker_role",
-        data_type="bf16",
+        dtype="bf16",
         task="text-generation",
         tensor_parallel_degree=2, # number of gpus to partition the model across using tensor parallelism
     )
@@ -58,7 +58,7 @@ If you want to use a specific backend, then you can create an instance of the co
     hf_accelerate_model = HuggingFaceAccelerateModel(
         "s3://my_bucket/my_saved_model_artifacts/", # This can also be a HuggingFace Hub model id
         "my_sagemaker_role",
-        data_type="fp16",
+        dtype="fp16",
         task="text-generation",
         number_of_partitions=2, # number of gpus to partition the model across
     )
@@ -109,7 +109,7 @@ For example, you can deploy the EleutherAI gpt-j-6B model like this:
     model = DJLModel(
         "EleutherAI/gpt-j-6B",
         "my_sagemaker_role",
-        data_type="fp16",
+        dtype="fp16",
         number_of_partitions=2
     )
 
@@ -142,7 +142,7 @@ You would then pass "s3://my_bucket/gpt-j-6B" as ``model_id`` to the ``DJLModel`
     model = DJLModel(
         "s3://my_bucket/gpt-j-6B",
         "my_sagemaker_role",
-        data_type="fp16",
+        dtype="fp16",
         number_of_partitions=2
     )
 
@@ -213,7 +213,7 @@ For more information about DJL Serving, see the `DJL Serving documentation. <htt
 SageMaker DJL Classes
 ***********************
 
-For information about the different DJL Serving related classes in the SageMaker Python SDK, see https://sagemaker.readthedocs.io/en/stable/sagemaker.djl_inference.html.
+For information about the different DJL Serving related classes in the SageMaker Python SDK, see https://sagemaker.readthedocs.io/en/stable/frameworks/djl/sagemaker.djl_inference.html.
 
 ********************************
 SageMaker DJL Serving Containers
 
@@ -892,7 +892,7 @@ see `For versions 1.1 and lower <#for-versions-1.1-and-lower>`_.
     |               |--inference.py
     |               |--requirements.txt
 
-Where ``requirments.txt`` is an optional file that specifies dependencies on third-party libraries.
+Where ``requirements.txt`` is an optional file that specifies dependencies on third-party libraries.
 
 Create a ``PyTorchModel`` object
 --------------------------------
 
@@ -19,6 +19,7 @@ fabric==2.6.0
 requests==2.27.1
 sagemaker-experiments==0.1.35
 Jinja2==3.0.3
+pyvis==0.2.1
 pandas>=1.3.5,<1.5
 scikit-learn==1.0.2
 cloudpickle==2.2.1
@@ -94,6 +94,8 @@
                             {object: object},
                         )
                     ],
+                    # Arbitrary JSON object as baseline
+                    {object: object},
                 ),
                 SchemaOptional("num_clusters"): int,
                 SchemaOptional("use_logit"): bool,
@@ -1211,7 +1213,7 @@ class SHAPConfig(ExplainabilityConfig):
 
     def __init__(
         self,
-        baseline: Optional[Union[str, List]] = None,
+        baseline: Optional[Union[str, List, Dict]] = None,
         num_samples: Optional[int] = None,
         agg_method: Optional[str] = None,
         use_logit: bool = False,
@@ -1224,7 +1226,7 @@ def __init__(
         """Initializes config for SHAP analysis.
 
         Args:
-            baseline (None or str or list): `Baseline dataset <https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-feature-attribute-shap-baselines.html>`_
+            baseline (None or str or list or dict): `Baseline dataset <https://docs.aws.amazon.com/sagemaker/latest/dg/clarify-feature-attribute-shap-baselines.html>`_
                 for the Kernel SHAP algorithm, accepted in the form of:
                 S3 object URI, a list of rows (with at least one element),
                 or None (for no input baseline). The baseline dataset must have the same format