rm white spaces

mchoi8739 · mchoi8739 · commit 250990f637ca · 2023-04-22T21:23:04.000-07:00
diff --git a/doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst b/doc/api/training/smd_model_parallel_release_notes/smd_model_parallel_change_log.rst
@@ -18,33 +18,33 @@ SageMaker Distributed Model Parallel 1.15.0 Release Notes
 
 **New Features**
 
-* Using sharded data parallelism with tensor parallelism together is now 
-  available for PyTorch 1.13.1. It allows you to train with smaller global batch 
-  sizes while scaling up to large clusters. For more information, see `Sharded 
-  data parallelism with tensor parallelism <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-sharded-data-parallelism.html#model-parallel-extended-features-pytorch-sharded-data-parallelism-with-tensor-parallelism>`_ 
+* Using sharded data parallelism with tensor parallelism together is now
+  available for PyTorch 1.13.1. It allows you to train with smaller global batch
+  sizes while scaling up to large clusters. For more information, see `Sharded
+  data parallelism with tensor parallelism <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-sharded-data-parallelism.html#model-parallel-extended-features-pytorch-sharded-data-parallelism-with-tensor-parallelism>`_
   in the *Amazon SageMaker Developer Guide*.
-* Added support for saving and loading full model checkpoints when using sharded 
-  data parallelism. This is enabled by using the standard checkpointing API, 
-  ``smp.save_checkpoint`` with ``partial=False``. 
-  Before, full checkpoints needed to be created by merging partial checkpoint 
-  files after training finishes. 
-* `DistributedTransformer <https://sagemaker.readthedocs.io/en/stable/api/training/smp_versions/latest/smd_model_parallel_pytorch_tensor_parallel.html#smdistributed.modelparallel.torch.nn.DistributedTransformerLayer>`_ 
-  now supports the ALiBi position embeddings. 
-  When using DistributedTransformer, you can set the ``use_alibi`` parameter 
-  to ``True`` to use the Triton-based flash attention kernels. This helps 
+* Added support for saving and loading full model checkpoints when using sharded
+  data parallelism. This is enabled by using the standard checkpointing API,
+  ``smp.save_checkpoint`` with ``partial=False``.
+  Before, full checkpoints needed to be created by merging partial checkpoint
+  files after training finishes.
+* `DistributedTransformer <https://sagemaker.readthedocs.io/en/stable/api/training/smp_versions/latest/smd_model_parallel_pytorch_tensor_parallel.html#smdistributed.modelparallel.torch.nn.DistributedTransformerLayer>`_
+  now supports the ALiBi position embeddings.
+  When using DistributedTransformer, you can set the ``use_alibi`` parameter
+  to ``True`` to use the Triton-based flash attention kernels. This helps
   evaluate sequences longer than those used for training.
 
 **Bug Fixes**
 
-* When using tensor parallelism, parameters were initialized multiple times 
+* When using tensor parallelism, parameters were initialized multiple times
   unncessarily. This release fixed the multiple initialization of parameters
-  so that each parameter is initialized exactly once. 
-  It not only saves time, but also ensures that the random generator behavior 
+  so that each parameter is initialized exactly once.
+  It not only saves time, but also ensures that the random generator behavior
   is similar to the non-tensor parallelism case.
-  
+
 **Known issues**
 
-* Model initialization might take longer with PyTorch 2.0 than that with PyTorch 1.13. 
+* Model initialization might take longer with PyTorch 2.0 than that with PyTorch 1.13.
 
 **Migration to AWS Deep Learning Containers**
 
@@ -57,9 +57,9 @@ This version passed benchmark testing and is migrated to the following AWS Deep
     763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:2.0.0-gpu-py310-cu118-ubuntu20.04-sagemaker
 
 - SageMaker training container for PyTorch v1.13.1
-  
+
   .. code::
-  
+
     763104351884.dkr.ecr.us-east-1.amazonaws.com/pytorch-training:1.13.1-gpu-py39-cu117-ubuntu20.04-sagemaker
 
 Binary file of this version of the library for `custom container
@@ -68,7 +68,7 @@ Binary file of this version of the library for `custom container
 - For PyTorch v2.0.0
 
   .. code::
-    
+
     https://sagemaker-distributed-model-parallel.s3.us-west-2.amazonaws.com/pytorch-2.0.0/build-artifacts/2023-04-14-20-14/smdistributed_modelparallel-1.15.0-cp310-cp310-linux_x86_64.whl
 
 - For PyTorch v1.13.1
diff --git a/doc/api/training/smp_versions/latest/smd_model_parallel_pytorch_tensor_parallel.rst b/doc/api/training/smp_versions/latest/smd_model_parallel_pytorch_tensor_parallel.rst
@@ -301,20 +301,20 @@ Tensor Parallelism Module APIs
          ``post_layernorm`` must be ``True``.
       -  ``post_layernorm``: If ``True``, inserts layer normalization at
          the output. At least one of ``pre_layernorm`` and
-         ``post_layernorm`` must be ``True``. 
-      -  ``use_alibi`` (bool, default False): Activates Attention with 
+         ``post_layernorm`` must be ``True``.
+      -  ``use_alibi`` (bool, default False): Activates Attention with
          Linear Biases (ALiBi) for attention computation.
-         ALiBi facilitates efficient extrapolation on input sequences 
-         and thus improves training efficiency. 
-         The library enables ALiBi by using the `Triton 
+         ALiBi facilitates efficient extrapolation on input sequences
+         and thus improves training efficiency.
+         The library enables ALiBi by using the `Triton
          flash attention kernel
          <https://github.com/HazyResearch/flash-attention>`_.
-         Refer to https://arxiv.org/abs/2108.12409 for more 
+         Refer to https://arxiv.org/abs/2108.12409 for more
          details on the technique.
-         (Available from 
+         (Available from
          the SageMaker model parallelism library v1.15.0.)
-      -  ``alibi_bias_max`` (int, default 8): Defines the ALiBi base 
-      	 value for mask generation. (Available from 
+      -  ``alibi_bias_max`` (int, default 8): Defines the ALiBi base
+      	 value for mask generation. (Available from
          the SageMaker model parallelism library v1.15.0.)
 
    -  **Methods:**