You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.rst
+2
Original file line number
Diff line number
Diff line change
@@ -133,6 +133,8 @@ To run the integration tests, the following prerequisites must be met
133
133
1. AWS account credentials are available in the environment for the boto3 client to use.
134
134
2. The AWS account has an IAM role named :code:`SageMakerRole`.
135
135
It should have the AmazonSageMakerFullAccess policy attached as well as a policy with `the necessary permissions to use Elastic Inference <https://docs.aws.amazon.com/sagemaker/latest/dg/ei-setup.html>`__.
136
+
3. To run remote_function tests, dummy ecr repo should be created. It can be created by running -
Note that the library does not support ``torch.compile`` in this release.
18
+
19
+
**New Features**
20
+
21
+
* Using sharded data parallelism with tensor parallelism together is now
22
+
available for PyTorch 1.13.1. It allows you to train with smaller global batch
23
+
sizes while scaling up to large clusters. For more information, see `Sharded
24
+
data parallelism with tensor parallelism <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-extended-features-pytorch-sharded-data-parallelism.html#model-parallel-extended-features-pytorch-sharded-data-parallelism-with-tensor-parallelism>`_
25
+
in the *Amazon SageMaker Developer Guide*.
26
+
* Added support for saving and loading full model checkpoints when using sharded
27
+
data parallelism. This is enabled by using the standard checkpointing API,
28
+
``smp.save_checkpoint`` with ``partial=False``.
29
+
Before, full checkpoints needed to be created by merging partial checkpoint
0 commit comments