Skip to content

Syncing master-jumpstart with dev #2887

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 55 commits into from
Feb 3, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
81a453b
feature: allow conditional parellel builds (#2727)
mufaddal-rohawala Nov 4, 2021
565e70e
fix endpoint bug (#2772)
BasilBeirouti Dec 6, 2021
34dd43a
fix: local mode - support relative file structure (#2768)
mufaddal-rohawala Dec 7, 2021
f8f2dbb
prepare release v2.72.0
Dec 13, 2021
217e8c8
update development version to v2.72.1.dev0
Dec 13, 2021
97b71c2
fix: Set ProcessingStep upload locations deterministically to avoid c…
staubhp Dec 8, 2021
dae7dac
fix: Prevent repack_model script from referencing nonexistent directo…
staubhp Dec 9, 2021
cef81d4
fix: S3Input - add support for instance attributes (#2754)
mufaddal-rohawala Dec 15, 2021
a3b588c
fix: typos and broken link (#2765)
mohamed-ali Dec 16, 2021
a24c397
prepare release v2.72.1
Dec 20, 2021
2dc0f34
update development version to v2.72.2.dev0
Dec 20, 2021
1873081
fix: Model Registration with BYO scripts (#2797)
sreedes Dec 17, 2021
0d5e925
fix: Add ContentType in test_auto_ml_describe
navinns Dec 27, 2021
1403e33
fix: Re-deploy static integ test endpoint if it is not found
Dec 27, 2021
8597238
documentation :SageMaker model parallel library 1.6.0 API doc (#2814)
mchoi8739 Dec 30, 2021
42dc98e
fix: fix kmeans test deletion sequence, increment lineage statics (#2…
mufaddal-rohawala Dec 31, 2021
72d1246
fix: Increment static lineage pipeline (#2817)
mufaddal-rohawala Jan 3, 2022
fadd687
fix: Update CHANGELOG.md (#2832)
ahsan-z-khan Jan 6, 2022
15132f6
prepare release v2.72.2
Jan 6, 2022
bb7a351
update development version to v2.72.3.dev0
Jan 6, 2022
be1eea6
change: update master from dev (#2836)
ahsan-z-khan Jan 10, 2022
11ce418
prepare release v2.72.3
Jan 10, 2022
c2b5f95
update development version to v2.72.4.dev0
Jan 10, 2022
944fbc6
fix: fixes unnecessary session call while generating pipeline definit…
xchen909 Jan 10, 2022
ab44079
feature: Add models_v2 under lineage context (#2800)
yzhu0 Jan 10, 2022
f75c7ca
feature: enable python 3.9 (#2802)
mufaddal-rohawala Jan 10, 2022
af0abf2
change: Update CHANGELOG.md (#2842)
shreyapandit Jan 11, 2022
021e64b
fix: update pricing link (#2805)
ahsan-z-khan Jan 11, 2022
b8e0e87
doc: Document the available ExecutionVariables (#2807)
tuliocasagrande Jan 12, 2022
0a65986
fix: Remove duplicate vertex/edge in query lineage (#2784)
yzhu0 Jan 12, 2022
70e59d6
feature: Support model pipelines in CreateModelStep (#2845)
staubhp Jan 12, 2022
753a0a0
feature: support JsonGet/Join parameterization in tuning step Hyperpa…
jerrypeng7773 Jan 13, 2022
01962bc
doc: Enhance smddp 1.2.2 doc (#2852)
mchoi8739 Jan 13, 2022
c4d3b9e
feature: support checkpoint to be passed from estimator (#2849)
marckarp Jan 13, 2022
718b8da
fix: allow kms_key to be passed for processing step (#2779)
jayatalr Jan 13, 2022
4efbe84
feature: Adds support for Serverless inference (#2831)
bhaoz Jan 14, 2022
bb1704a
feature: Add support for SageMaker lineage queries in action (#2853)
yzhu0 Jan 14, 2022
867f300
feature: Adds Lineage queries in artifact, context and trial componen…
yzhu0 Jan 18, 2022
c936ec0
feature: Add EMRStep support in Sagemaker pipeline (#2848)
EthanShouhanCheng Jan 18, 2022
4f66d51
prepare release v2.73.0
Jan 19, 2022
c32093c
update development version to v2.73.1.dev0
Jan 19, 2022
109730e
feature: Add support for SageMaker lineage queries context (#2830)
yzhu0 Jan 19, 2022
ead2b16
fix: support specifying a facet by its column index
xgchena Oct 30, 2021
a594ece
doc: more documentation for serverless inference (#2859)
bhaoz Jan 20, 2022
187c3df
prepare release v2.74.0
Jan 26, 2022
d289377
update development version to v2.74.1.dev0
Jan 26, 2022
ed7dce3
Add deprecation warning in Clarify DataConfig (#2847)
keerthanvasist Jan 26, 2022
7a347c7
feature: Update instance types for integ test (#2881)
jeniyat Jan 29, 2022
34b07c0
feature: Adds support for async inference (#2846)
bhaoz Jan 29, 2022
81b6751
Merge remote-tracking branch 'origin/dev' into master-jumpstart
evakravi Jan 31, 2022
90b0b0f
fix: update to incorporate black v22, pin tox versions (#2889)
jeniyat Feb 3, 2022
bb2a1f5
bring in latest changes from master
shreyapandit Feb 3, 2022
a6f63b3
make black happy
shreyapandit Feb 3, 2022
7174607
Merge remote-tracking branch 'origin/dev' into master-jumpstart
evakravi Feb 3, 2022
028c70c
Merge branch 'master-jumpstart' of github.com:evakravi/sagemaker-pyth…
evakravi Feb 3, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 41 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,46 @@
# Changelog

## v2.74.0 (2022-01-26)

### Features

* Add support for SageMaker lineage queries context

### Bug Fixes and Other Changes

* support specifying a facet by its column index

### Documentation Changes

* more documentation for serverless inference

## v2.73.0 (2022-01-19)

### Features

* Add EMRStep support in Sagemaker pipeline
* Adds Lineage queries in artifact, context and trial components
* Add support for SageMaker lineage queries in action
* Adds support for Serverless inference
* support checkpoint to be passed from estimator
* support JsonGet/Join parameterization in tuning step Hyperparameters
* Support model pipelines in CreateModelStep
* enable python 3.9
* Add models_v2 under lineage context

### Bug Fixes and Other Changes

* allow kms_key to be passed for processing step
* Remove duplicate vertex/edge in query lineage
* update pricing link
* Update CHANGELOG.md
* fixes unnecessary session call while generating pipeline definition for lambda step

### Documentation Changes

* Enhance smddp 1.2.2 doc
* Document the available ExecutionVariables

## v2.72.3 (2022-01-10)

### Features
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
2.72.4.dev0
2.74.1.dev0
2 changes: 1 addition & 1 deletion doc/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

# You can set these variables from the command line.
SPHINXOPTS = -W
SPHINXBUILD = python -msphinx
SPHINXBUILD = python -msphinx
SPHINXPROJ = sagemaker
SOURCEDIR = .
BUILDDIR = _build
Expand Down
19 changes: 19 additions & 0 deletions doc/api/inference/async_inference.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Async Inference
-----------------

This module contains classes related to Amazon Sagemaker Async Inference

.. automodule:: sagemaker.async_inference.async_inference_config
:members:
:undoc-members:
:show-inheritance:

.. automodule:: sagemaker.async_inference.async_inference_response
:members:
:undoc-members:
:show-inheritance:

.. automodule:: sagemaker.async_inference.waiter_config
:members:
:undoc-members:
:show-inheritance:
9 changes: 9 additions & 0 deletions doc/api/inference/predictor_async.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
AsyncPredictor
--------------------

Make async predictions against SageMaker endpoints with Python objects

.. autoclass:: sagemaker.predictor_async.AsyncPredictor
:members:
:undoc-members:
:show-inheritance:
9 changes: 9 additions & 0 deletions doc/api/inference/serverless.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Serverless Inference
---------------------

This module contains classes related to Amazon Sagemaker Serverless Inference

.. automodule:: sagemaker.serverless.serverless_inference_config
:members:
:undoc-members:
:show-inheritance:
232 changes: 132 additions & 100 deletions doc/api/training/sdp_versions/latest/smd_data_parallel_pytorch.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,12 @@
PyTorch Guide to SageMaker's distributed data parallel library
##############################################################

.. admonition:: Contents
Use this guide to learn about the SageMaker distributed
data parallel library API for PyTorch.

- :ref:`pytorch-sdp-modify`
- :ref:`pytorch-sdp-api`
.. contents:: Topics
:depth: 3
:local:

.. _pytorch-sdp-modify:

Expand Down Expand Up @@ -55,7 +57,7 @@ API offered for PyTorch.


- Modify the ``torch.utils.data.distributed.DistributedSampler`` to
include the cluster’s information. Set``num_replicas`` to the
include the cluster’s information. Set ``num_replicas`` to the
total number of GPUs participating in training across all the nodes
in the cluster. This is called ``world_size``. You can get
``world_size`` with
Expand Down Expand Up @@ -110,7 +112,7 @@ you will have for distributed training with the distributed data parallel librar
def main():

    # Scale batch size by world size
    batch_size //= dist.get_world_size() // 8
    batch_size //= dist.get_world_size()
    batch_size = max(batch_size, 1)

    # Prepare dataset
Expand Down Expand Up @@ -153,9 +155,132 @@ you will have for distributed training with the distributed data parallel librar
PyTorch API
===========

.. rubric:: Supported versions
.. class:: smdistributed.dataparallel.torch.parallel.DistributedDataParallel(module, device_ids=None, output_device=None, broadcast_buffers=True, process_group=None, bucket_cap_mb=None)

``smdistributed.dataparallel``'s implementation of distributed data
parallelism for PyTorch. In most cases, wrapping your PyTorch Module
with ``smdistributed.dataparallel``'s ``DistributedDataParallel`` (DDP) is
all you need to do to use ``smdistributed.dataparallel``.

Creation of this DDP class requires ``smdistributed.dataparallel``
already initialized
with ``smdistributed.dataparallel.torch.distributed.init_process_group()``.

This container parallelizes the application of the given module by
splitting the input across the specified devices by chunking in the
batch dimension. The module is replicated on each machine and each
device, and each such replica handles a portion of the input. During the
backwards pass, gradients from each node are averaged.

The batch size should be larger than the number of GPUs used locally.
Example usage
of ``smdistributed.dataparallel.torch.parallel.DistributedDataParallel``:

.. code:: python

import torch
import smdistributed.dataparallel.torch.distributed as dist
from smdistributed.dataparallel.torch.parallel import DistributedDataParallel as DDP

dist.init_process_group()

# Pin GPU to be used to process local rank (one GPU per process)
torch.cuda.set_device(dist.get_local_rank())

# Build model and optimizer
model = ...
optimizer = torch.optim.SGD(model.parameters(),
                            lr=1e-3 * dist.get_world_size())
# Wrap model with smdistributed.dataparallel's DistributedDataParallel
model = DDP(model)

**Parameters:**

- ``module (torch.nn.Module)(required):`` PyTorch NN Module to be
parallelized
- ``device_ids (list[int])(optional):`` CUDA devices. This should only
be provided when the input module resides on a single CUDA device.
For single-device modules,
the ``ith module replica is placed on device_ids[i]``. For
multi-device modules and CPU modules, device_ids must be None or an
empty list, and input data for the forward pass must be placed on the
correct device. Defaults to ``None``.
- ``output_device (int)(optional):`` Device location of output for
single-device CUDA modules. For multi-device modules and CPU modules,
it must be None, and the module itself dictates the output location.
(default: device_ids[0] for single-device modules).  Defaults
to ``None``.
- ``broadcast_buffers (bool)(optional):`` Flag that enables syncing
(broadcasting) buffers of the module at beginning of the forward
function. ``smdistributed.dataparallel`` does not support broadcast
buffer yet. Please set this to ``False``.
- ``process_group(smdistributed.dataparallel.torch.distributed.group)(optional):`` Process
group is not supported in ``smdistributed.dataparallel``. This
parameter exists for API parity with torch.distributed only. Only
supported value is
``smdistributed.dataparallel.torch.distributed.group.WORLD.`` Defaults
to ``None.``
- ``bucket_cap_mb (int)(optional):`` DistributedDataParallel will
bucket parameters into multiple buckets so that gradient reduction of
each bucket can potentially overlap with backward
computation. ``bucket_cap_mb`` controls the bucket size in
MegaBytes (MB) (default: 25).

.. note::

This module assumes all parameters are registered in the model by the
time it is created. No parameters should be added nor removed later.

.. note::

This module assumes all parameters are registered in the model of
each distributed processes are in the same order. The module itself
will conduct gradient all-reduction following the reverse order of
the registered parameters of the model. In other words, it is users’
responsibility to ensure that each distributed process has the exact
same model and thus the exact same parameter registration order.

.. note::

You should never change the set of your model’s parameters after
wrapping up your model with DistributedDataParallel. In other words,
when wrapping up your model with DistributedDataParallel, the
constructor of DistributedDataParallel will register the additional
gradient reduction functions on all the parameters of the model
itself at the time of construction. If you change the model’s
parameters after the DistributedDataParallel construction, this is
not supported and unexpected behaviors can happen, since some
parameters’ gradient reduction functions might not get called.

.. method:: no_sync()

``smdistributed.dataparallel`` supports the `PyTorch DDP no_sync() <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html#torch.nn.parallel.DistributedDataParallel.no_sync>`_
context manager. It enables gradient accumulation by skipping AllReduce
during training iterations inside the context.

.. note::

The ``no_sync()`` context manager is available from smdistributed-dataparallel v1.2.2.
To find the release note, see :ref:`sdp_1.2.2_release_note`.

**PyTorch 1.7.1, 1.8.1**
**Example:**

.. code:: python

# Gradients are accumulated while inside no_sync context
with model.no_sync():
...
loss.backward()

# First iteration upon exiting context
# Incoming gradients are added to the accumulated gradients and then synchronized via AllReduce
...
loss.backward()

# Update weights and reset gradients to zero after accumulation is finished
optimizer.step()
optimizer.zero_grad()


.. function:: smdistributed.dataparallel.torch.distributed.is_available()
Expand Down Expand Up @@ -409,99 +534,6 @@ PyTorch API
otherwise.


.. class:: smdistributed.dataparallel.torch.parallel.DistributedDataParallel(module, device_ids=None, output_device=None, broadcast_buffers=True, process_group=None, bucket_cap_mb=None)

``smdistributed.dataparallel's`` implementation of distributed data
parallelism for PyTorch. In most cases, wrapping your PyTorch Module
with ``smdistributed.dataparallel's`` ``DistributedDataParallel (DDP)`` is
all you need to do to use ``smdistributed.dataparallel``.

Creation of this DDP class requires ``smdistributed.dataparallel``
already initialized
with ``smdistributed.dataparallel.torch.distributed.init_process_group()``.

This container parallelizes the application of the given module by
splitting the input across the specified devices by chunking in the
batch dimension. The module is replicated on each machine and each
device, and each such replica handles a portion of the input. During the
backwards pass, gradients from each node are averaged.

The batch size should be larger than the number of GPUs used locally.
Example usage
of ``smdistributed.dataparallel.torch.parallel.DistributedDataParallel``:

.. code:: python

import torch
import smdistributed.dataparallel.torch.distributed as dist
from smdistributed.dataparallel.torch.parallel import DistributedDataParallel as DDP

dist.init_process_group()

# Pin GPU to be used to process local rank (one GPU per process)
torch.cuda.set_device(dist.get_local_rank())

# Build model and optimizer
model = ...
optimizer = torch.optim.SGD(model.parameters(),
                            lr=1e-3 * dist.get_world_size())
# Wrap model with smdistributed.dataparallel's DistributedDataParallel
model = DDP(model)

**Parameters:**

- ``module (torch.nn.Module)(required):`` PyTorch NN Module to be
parallelized
- ``device_ids (list[int])(optional):`` CUDA devices. This should only
be provided when the input module resides on a single CUDA device.
For single-device modules,
the ``ith module replica is placed on device_ids[i]``. For
multi-device modules and CPU modules, device_ids must be None or an
empty list, and input data for the forward pass must be placed on the
correct device. Defaults to ``None``.
- ``output_device (int)(optional):`` Device location of output for
single-device CUDA modules. For multi-device modules and CPU modules,
it must be None, and the module itself dictates the output location.
(default: device_ids[0] for single-device modules).  Defaults
to ``None``.
- ``broadcast_buffers (bool)(optional):`` Flag that enables syncing
(broadcasting) buffers of the module at beginning of the forward
function. ``smdistributed.dataparallel`` does not support broadcast
buffer yet. Please set this to ``False``.
- ``process_group(smdistributed.dataparallel.torch.distributed.group)(optional):`` Process
group is not supported in ``smdistributed.dataparallel``. This
parameter exists for API parity with torch.distributed only. Only
supported value is
``smdistributed.dataparallel.torch.distributed.group.WORLD.`` Defaults
to ``None.``
- ``bucket_cap_mb (int)(optional):`` DistributedDataParallel will
bucket parameters into multiple buckets so that gradient reduction of
each bucket can potentially overlap with backward
computation. ``bucket_cap_mb`` controls the bucket size in
MegaBytes (MB) (default: 25).

.. rubric:: Notes

- This module assumes all parameters are registered in the model by the
time it is created. No parameters should be added nor removed later.
- This module assumes all parameters are registered in the model of
each distributed processes are in the same order. The module itself
will conduct gradient all-reduction following the reverse order of
the registered parameters of the model. In other words, it is users’
responsibility to ensure that each distributed process has the exact
same model and thus the exact same parameter registration order.
- You should never change the set of your model’s parameters after
wrapping up your model with DistributedDataParallel. In other words,
when wrapping up your model with DistributedDataParallel, the
constructor of DistributedDataParallel will register the additional
gradient reduction functions on all the parameters of the model
itself at the time of construction. If you change the model’s
parameters after the DistributedDataParallel construction, this is
not supported and unexpected behaviors can happen, since some
parameters’ gradient reduction functions might not get called.


.. class:: smdistributed.dataparallel.torch.distributed.ReduceOp

An enum-like class for supported reduction operations
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -155,10 +155,6 @@ script you will have for distributed training with the library.
TensorFlow API
==============

.. rubric:: Supported versions

**TensorFlow 2.3.1, 2.4.1, 2.5.0**

.. function:: smdistributed.dataparallel.tensorflow.init()

Initialize ``smdistributed.dataparallel``. Must be called at the
Expand Down
Loading