1
- .. admonition :: Contents
2
-
3
- - :ref: `pytorch_saving_loading `
4
- - :ref: `pytorch_saving_loading_instructions `
5
-
6
1
PyTorch API
7
2
===========
8
3
9
- **Supported versions: 1.7.1, 1.8.1 **
10
-
11
- This API document assumes you use the following import statements in your training scripts.
4
+ To use the PyTorch-specific APIs for SageMaker distributed model parallism,
5
+ you need to add the following import statement at the top of your training script.
12
6
13
7
.. code :: python
14
8
@@ -19,10 +13,10 @@ This API document assumes you use the following import statements in your traini
19
13
20
14
Refer to
21
15
`Modify a PyTorch Training Script
22
- <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-customize-training-script.html#model-parallel-customize-training-script-pt > `_
16
+ <https://docs.aws.amazon.com/sagemaker/latest/dg/model-parallel-customize-training-script-pt.html > `_
23
17
to learn how to use the following API in your PyTorch training script.
24
18
25
- .. class :: smp.DistributedModel
19
+ .. py : class :: smp.DistributedModel()
26
20
27
21
A sub-class of ``torch.nn.Module`` which specifies the model to be
28
22
partitioned. Accepts a ``torch.nn.Module`` object ``module`` which is
@@ -42,7 +36,6 @@ This API document assumes you use the following import statements in your traini
42
36
is \ ``model ``) can only be made inside a ``smp.step ``-decorated
43
37
function.
44
38
45
-
46
39
Since ``DistributedModel`` is a ``torch.nn.Module ``, a forward pass can
47
40
be performed by calling the \ ``DistributedModel `` object on the input
48
41
tensors.
@@ -56,7 +49,6 @@ This API document assumes you use the following import statements in your traini
56
49
arguments, replacing the PyTorch operations \ ``torch.Tensor.backward ``
57
50
or ``torch.autograd.backward ``.
58
51
59
-
60
52
The API for ``model.backward `` is very similar to
61
53
``torch.autograd.backward ``. For example, the following
62
54
``backward`` calls:
@@ -90,7 +82,7 @@ This API document assumes you use the following import statements in your traini
90
82
91
83
**Using DDP **
92
84
93
- If DDP is enabled, do not not place a PyTorch
85
+ If DDP is enabled with the SageMaker model parallel library , do not not place a PyTorch
94
86
``DistributedDataParallel `` wrapper around the ``DistributedModel `` because
95
87
the ``DistributedModel `` wrapper will also handle data parallelism.
96
88
@@ -284,6 +276,113 @@ This API document assumes you use the following import statements in your traini
284
276
`register_comm_hook <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel.register_comm_hook >`__
285
277
in the PyTorch documentation.
286
278
279
+ **Behavior of ** ``smp.DistributedModel `` **with Tensor Parallelism **
280
+
281
+ When a model is wrapped by ``smp.DistributedModel ``, the library
282
+ immediately traverses the modules of the model object, and replaces the
283
+ modules that are supported for tensor parallelism with their distributed
284
+ counterparts. This replacement happens in place. If there are no other
285
+ references to the original modules in the script, they are
286
+ garbage-collected. The module attributes that previously referred to the
287
+ original submodules now refer to the distributed versions of those
288
+ submodules.
289
+
290
+ **Example: **
291
+
292
+ .. code :: python
293
+
294
+ # register DistributedSubmodule as the distributed version of Submodule
295
+ # (note this is a hypothetical example, smp.nn.DistributedSubmodule does not exist)
296
+ smp.tp_register_with_module(Submodule, smp.nn.DistributedSubmodule)
297
+
298
+ class MyModule (nn .Module ):
299
+ def __init__ (self ):
300
+ ...
301
+
302
+ self .submodule = Submodule()
303
+ ...
304
+
305
+ # enabling tensor parallelism for the entire model
306
+ with smp.tensor_parallelism():
307
+ model = MyModule()
308
+
309
+ # here model.submodule is still a Submodule object
310
+ assert isinstance (model.submodule, Submodule)
311
+
312
+ model = smp.DistributedModel(model)
313
+
314
+ # now model.submodule is replaced with an equivalent instance
315
+ # of smp.nn.DistributedSubmodule
316
+ assert isinstance (model.module.submodule, smp.nn.DistributedSubmodule)
317
+
318
+ If ``pipeline_parallel_degree `` (equivalently, ``partitions ``) is 1, the
319
+ placement of model partitions into GPUs and the initial broadcast of
320
+ model parameters and buffers across data-parallel ranks take place
321
+ immediately. This is because it does not need to wait for the model
322
+ partition when ``smp.DistributedModel `` wrapper is called. For other
323
+ cases with ``pipeline_parallel_degree `` greater than 1, the broadcast
324
+ and device placement will be deferred until the first call of an
325
+ ``smp.step ``-decorated function happens. This is because the first
326
+ ``smp.step ``-decorated function call is when the model partitioning
327
+ happens if pipeline parallelism is enabled.
328
+
329
+ Because of the module replacement during the ``smp.DistributedModel ``
330
+ call, any ``load_state_dict `` calls on the model, as well as any direct
331
+ access to model parameters, such as during the optimizer creation,
332
+ should be done **after ** the ``smp.DistributedModel `` call.
333
+
334
+ Since the broadcast of the model parameters and buffers happens
335
+ immediately during ``smp.DistributedModel `` call when the degree of
336
+ pipeline parallelism is 1, using ``@smp.step `` decorators is not
337
+ required when tensor parallelism is used by itself (without pipeline
338
+ parallelism).
339
+
340
+ For more information about the library's tensor parallelism APIs for PyTorch,
341
+ see :ref: `smdmp-pytorch-tensor-parallel `.
342
+
343
+ **Additional Methods of ** ``smp.DistributedModel `` **for Tensor Parallelism **
344
+
345
+ The following are the new methods of ``smp.DistributedModel ``, in
346
+ addition to the ones listed in the
347
+ `documentation <https://sagemaker.readthedocs.io/en/stable/api/training/smp_versions/v1.2.0/smd_model_parallel_pytorch.html#smp.DistributedModel >`__.
348
+
349
+ .. function :: distributed_modules()
350
+
351
+ - An iterator that runs over the set of distributed
352
+ (tensor-parallelized) modules in the model
353
+
354
+ .. function :: is_distributed_parameter(param)
355
+
356
+ - Returns ``True `` if the given ``nn.Parameter `` is distributed over
357
+ tensor-parallel ranks.
358
+
359
+ .. function :: is_distributed_buffer(buf)
360
+
361
+ - Returns ``True `` if the given buffer is distributed over
362
+ tensor-parallel ranks.
363
+
364
+ .. function :: is_scaled_batch_parameter(param)
365
+
366
+ - Returns ``True `` if the given ``nn.Parameter `` is operates on the
367
+ scaled batch (batch over the entire ``TP_GROUP ``, and not only the
368
+ local batch).
369
+
370
+ .. function :: is_scaled_batch_buffer(buf)
371
+
372
+ - Returns ``True `` if the parameter corresponding to the given
373
+ buffer operates on the scaled batch (batch over the entire
374
+ ``TP_GROUP ``, and not only the local batch).
375
+
376
+ .. function :: default_reducer_named_parameters()
377
+
378
+ - Returns an iterator that runs over ``(name, param) `` tuples, for
379
+ ``param `` that is allreduced over the ``DP_GROUP ``.
380
+
381
+ .. function :: scaled_batch_reducer_named_parameters()
382
+
383
+ - Returns an iterator that runs over ``(name, param) `` tuples, for
384
+ ``param `` that is allreduced over the ``RDP_GROUP ``.
385
+
287
386
288
387
289
388
.. class :: smp.DistributedOptimizer
0 commit comments