Skip to content

documentation: the SageMaker distributed data parallel v1.4.0 release #2980

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 39 commits into from
Mar 10, 2022
Merged
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
8210375
fix: fixes unnecessary session call while generating pipeline definit…
xchen909 Jan 10, 2022
972a6d2
feature: Add models_v2 under lineage context (#2800)
yzhu0 Jan 10, 2022
7206b9e
feature: enable python 3.9 (#2802)
mufaddal-rohawala Jan 10, 2022
127c964
change: Update CHANGELOG.md (#2842)
shreyapandit Jan 11, 2022
554d735
fix: update pricing link (#2805)
ahsan-z-khan Jan 11, 2022
88e4d68
doc: Document the available ExecutionVariables (#2807)
tuliocasagrande Jan 12, 2022
b3c19d8
fix: Remove duplicate vertex/edge in query lineage (#2784)
yzhu0 Jan 12, 2022
b591959
fix: Update Static Endpoint (#2931)
navinsoni Feb 15, 2022
3c5ea3a
Add exception in test_action (#2938)
navinsoni Feb 16, 2022
9d84e2e
change: pin test dependencies (#2929)
mufaddal-rohawala Feb 16, 2022
cede5fa
feature: Add FailStep Support for Sagemaker Pipeline (#2872)
qidewenwhen Feb 16, 2022
a765512
change: use recommended inference image uri from Neo API (#2923)
HappyAmazonian Feb 16, 2022
4a1c4df
Merge branch 'dev' of https://github.com/aws/sagemaker-python-sdk int…
mchoi8739 Feb 16, 2022
3833310
archive previous doc
mchoi8739 Feb 16, 2022
cb9d235
restructure the herring api doc
mchoi8739 Feb 16, 2022
5e4128e
update pytorch page
mchoi8739 Feb 17, 2022
5159c85
polish doc style and structure
mchoi8739 Feb 17, 2022
6670e30
fix: jumpstart model table (#2954)
bencrabtree Feb 24, 2022
9a5e8bc
update distributed training doc
mchoi8739 Feb 24, 2022
0026f6b
Merge branch 'dev' of https://github.com/aws/sagemaker-python-sdk int…
mchoi8739 Feb 24, 2022
f166b60
change: update code to get commit_id in codepipeline (#2961)
navinsoni Feb 26, 2022
086258d
feature: Data Serializer (#2956)
jeniyat Feb 28, 2022
a39b750
change: reorganize test files for workflow (#2960)
qidewenwhen Mar 3, 2022
28fd737
feature: TensorFlow 2.4 for Neo (#2861)
Qingzi-Lan Mar 3, 2022
20df3d7
fix: Remove sagemaker_job_name from hyperparameters in TrainingStep (…
staubhp Mar 3, 2022
b9f90dc
fix: Style update in DataSerializer (#2962)
jeniyat Mar 3, 2022
35863e2
documentation: sync upstream
mchoi8739 Mar 3, 2022
ee0757d
documentation: add ref
mchoi8739 Mar 3, 2022
6db3774
documentation: smddp doc update (#2968)
mchoi8739 Mar 4, 2022
d610bfb
fix: container env generation for S3 URI and add test for the same (#…
shreyapandit Mar 7, 2022
169dffd
documentation: update sagemaker training compiler docstring (#2969)
mchoi8739 Mar 7, 2022
4325fcd
feat: Python 3.9 for readthedocs (#2973)
ahsan-z-khan Mar 8, 2022
b07b869
Merge branch 'dev' of https://github.com/aws/sagemaker-python-sdk int…
mchoi8739 Mar 8, 2022
e1b77a7
Merge branch 'dev' of https://github.com/aws/sagemaker-python-sdk int…
mchoi8739 Mar 8, 2022
bc86c5a
drop the workflow integ test
mchoi8739 Mar 8, 2022
ad068b6
Trigger Build
mchoi8739 Mar 9, 2022
43db88b
Merge branch 'master' into smddp-1.4.0-doc
ahsan-z-khan Mar 9, 2022
4e19986
Trigger Build
mchoi8739 Mar 10, 2022
b85f772
Merge branch 'smddp-1.4.0-doc' of https://github.com/mchoi8739/sagema…
mchoi8739 Mar 10, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .readthedocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
version: 2

python:
version: 3.6
version: 3.9
install:
- method: pip
path: .
Expand Down
19 changes: 18 additions & 1 deletion doc/api/training/distributed.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,25 @@ SageMaker distributed training libraries offer both data parallel and model para
They combine software and hardware technologies to improve inter-GPU and inter-node communications.
They extend SageMaker’s training capabilities with built-in options that require only small code changes to your training scripts.

.. _sdp_api_docs_toc:

The SageMaker Distributed Data Parallel Library
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. toctree::
:maxdepth: 3

smd_data_parallel
sdp_versions/latest
smd_data_parallel_use_sm_pysdk
smd_data_parallel_release_notes/smd_data_parallel_change_log

.. _smp_api_docs_toc:

The SageMaker Distributed Model Parallel Library
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. toctree::
:maxdepth: 3

smd_data_parallel
smd_model_parallel
8 changes: 8 additions & 0 deletions doc/api/training/sdp_versions/archives.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.. _smddp-version-archive:

.. toctree::
:maxdepth: 1

v1_2_x.rst
v1_1_x.rst
v1_0_0.rst
44 changes: 41 additions & 3 deletions doc/api/training/sdp_versions/latest.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,47 @@
.. _sdp_api_docs:

Version 1.2.x (Latest)
#############################################
Use the Library to Adapt Your Training Script
#############################################

This section contains the SageMaker distributed data parallel API documentation.
If you are a new user of this library, it is recommended you use this guide alongside
`SageMaker's Distributed Data Parallel Library
<https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel.html>`_.

The library provides framework-specific APIs for TensorFlow and PyTorch.

Select the latest or one of the previous versions of the API documentation
depending on the version of the library you use.

.. important::

The distributed data parallel library supports training jobs using CUDA 11 or later.
When you define a :class:`sagemaker.tensorflow.estimator.TensorFlow` or
:class:`sagemaker.pytorch.estimator.PyTorch`
estimator with the data parallel library enabled,
SageMaker uses CUDA 11. When you extend or customize your own training image,
you must use a base image with CUDA 11 or later. See
`SageMaker Python SDK's distributed data parallel library APIs
<https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-use-api.html#data-parallel-use-python-skd-api>`_
for more information.

Version 1.4.0 (Latest)
======================

.. toctree::
:maxdepth: 1

latest/smd_data_parallel_pytorch.rst
latest/smd_data_parallel_tensorflow.rst
latest/smd_data_parallel_pytorch
latest/smd_data_parallel_tensorflow

Documentation Archive
=====================

To find the API documentation for the previous versions of the library,
choose one of the following:

.. toctree::
:maxdepth: 1

archives
Loading