Skip to content

fix: support estimator output path parameterization #3108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 117 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
117 commits
Select commit Hold shift + click to select a range
f166b60
change: update code to get commit_id in codepipeline (#2961)
navinsoni Feb 26, 2022
086258d
feature: Data Serializer (#2956)
jeniyat Feb 28, 2022
a39b750
change: reorganize test files for workflow (#2960)
qidewenwhen Mar 3, 2022
28fd737
feature: TensorFlow 2.4 for Neo (#2861)
Qingzi-Lan Mar 3, 2022
20df3d7
fix: Remove sagemaker_job_name from hyperparameters in TrainingStep (…
staubhp Mar 3, 2022
b9f90dc
fix: Style update in DataSerializer (#2962)
jeniyat Mar 3, 2022
6db3774
documentation: smddp doc update (#2968)
mchoi8739 Mar 4, 2022
d610bfb
fix: container env generation for S3 URI and add test for the same (#…
shreyapandit Mar 7, 2022
169dffd
documentation: update sagemaker training compiler docstring (#2969)
mchoi8739 Mar 7, 2022
4325fcd
feat: Python 3.9 for readthedocs (#2973)
ahsan-z-khan Mar 8, 2022
3baf513
prepare release v2.78.0
Mar 7, 2022
c0c2b9c
update development version to v2.78.1.dev0
Mar 7, 2022
b74736f
documentation: the SageMaker distributed data parallel v1.4.0 release…
mchoi8739 Mar 10, 2022
be831b4
feat: custom base job name for jumpstart models/estimators (#2970)
evakravi Mar 11, 2022
588dd69
feature: Inferentia Neuron support for HuggingFace (#2976)
jeniyat Mar 15, 2022
80f3054
prepare release v2.79.0
Mar 16, 2022
a57ad9d
update development version to v2.79.1.dev0
Mar 16, 2022
f6689a3
fix: jumpstart docs network isolation (#2989)
evakravi Mar 16, 2022
4440b5b
feature: AutoGluon 0.3.2 and 0.4.0 image_uris (#2991)
gradientsky Mar 16, 2022
eab7723
feature: Support for remote docker host (#2864)
samadwar Mar 16, 2022
c87a887
fix: gpu integs CapacityError - fallback to available compute (#3004)
mufaddal-rohawala Mar 17, 2022
f49061b
feature: Add support for TF 2.6.3 (#3006)
saimidu Mar 18, 2022
a629f2f
feature: TF242 ioc support (#2982)
Qingzi-Lan Mar 18, 2022
f40ace0
feature: Add support for TF 2.8 (#3000)
saimidu Mar 18, 2022
c9539bd
documentation: sagemaker distributed model parallel 1.7.0 doc (#2992)
mchoi8739 Mar 18, 2022
df29ea3
fix: gpu integs CapacityError - fallback to available compute (#3008)
mufaddal-rohawala Mar 18, 2022
4079ea1
feature: Add support for TF2.7 (#2958)
arjkesh Mar 18, 2022
f2a0379
change: Add JumpStart model table build notification (#2997)
IvyBazan Mar 18, 2022
e22e04f
fix: Align max_wait definitions in EstimaorBase and Estimator (#3009)
mohamed-ali Mar 18, 2022
9474bd3
prepare release v2.80.0
Mar 18, 2022
f9fedfe
update development version to v2.80.1.dev0
Mar 18, 2022
0f89c70
documentation: minor fixes for smddp 1.4.0 doc (#2996)
mchoi8739 Mar 19, 2022
d74befa
feature: Hugging Face Transformers 4.17 for PT 1.10 (#3011)
saimidu Mar 22, 2022
61d6a3d
feat: enable EnableInterContainerTrafficEncryption for model monitori…
jerrypeng7773 Mar 24, 2022
d602db5
change: Implement override solution for pipeline variables (#2995)
qidewenwhen Mar 26, 2022
48406d3
fix: temporarily skip tests impacted by data inconsistency (#3020)
danabens Mar 26, 2022
ef50469
fix: remove `new` from serverless (#3018)
philschmid Mar 26, 2022
a4dc2f2
feature: Retrieve data configuration (#3016)
shreyapandit Mar 26, 2022
a851c89
doc: add documentation for image_uri serverless use case (#3022)
bhaoz Mar 26, 2022
691154a
prepare release v2.81.0
Mar 26, 2022
1385707
update development version to v2.81.1.dev0
Mar 26, 2022
3d95d81
Update black-check version, add support for Spark 3.1 Processing (#3034)
shreyapandit Mar 29, 2022
7433bb8
prepare release v2.81.1
Mar 29, 2022
0381441
update development version to v2.81.2.dev0
Mar 29, 2022
d46d1b6
feature: support passing Env Vars to local mode training (#3015)
mufaddal-rohawala Mar 29, 2022
ffd6793
feature: pluggable instance fallback mechanism, add CapacityError (#3…
mufaddal-rohawala Mar 30, 2022
2f59de8
prepare release v2.82.0
Mar 30, 2022
63f68ac
update development version to v2.82.1.dev0
Mar 30, 2022
1c2f92c
more logging info for static pipeline test data setup (#3019)
danabens Mar 30, 2022
21dcf1c
fix: Fix Pipeline variables related customer issues (#2959)
qidewenwhen Mar 30, 2022
2686260
Update Inferentia Image URI Config (#3037)
YYStreet Mar 31, 2022
72cb6c0
prepare release v2.82.1
Mar 31, 2022
cfb0ad1
update development version to v2.82.2.dev0
Mar 31, 2022
eef101f
fix: Refactor repack_model script injection, fixes tar.gz error(#3039)
staubhp Mar 31, 2022
95689bb
Revert "fix: Fix Pipeline variables related customer issues (#2959)" …
staubhp Apr 1, 2022
14ef4bd
prepare release v2.82.2
Apr 1, 2022
cd94897
update development version to v2.82.3.dev0
Apr 1, 2022
d380a6e
feature: Hugging Face Transformers 4.17 for TF 2.6 (#3027)
Qingzi-Lan Apr 1, 2022
6ec40d4
fix: IOC image version select issue (#3021)
Qingzi-Lan Apr 4, 2022
85bb836
prepare release v2.83.0
Apr 4, 2022
c88713d
update development version to v2.83.1.dev0
Apr 4, 2022
ab48fc4
feature: add xgboost framework version 1.5-1 (#3044)
haixiw Apr 5, 2022
bd13900
feature: dependabot integ - move all deps to requirements.txt (#2981)
mufaddal-rohawala Apr 5, 2022
ab86a7f
prepare release v2.84.0
Apr 7, 2022
8804f83
update development version to v2.84.1.dev0
Apr 7, 2022
7a98d73
feature: add serverless inference image_uri retrieve support (#3035)
bhaoz Apr 7, 2022
61d2056
Fix: remove old legacy code for web analytics (#3053)
mchoi8739 Apr 7, 2022
73e623c
fix: Support file URIs in ProcessingStep's code parameter (#3051)
staubhp Apr 7, 2022
70059e7
feat: jumpstart model url (#3036)
evakravi Apr 7, 2022
4f58f92
fix: Add back the Fix for Pipeline variables related customer issues …
qidewenwhen Apr 7, 2022
38fd3cb
feature: update lambda code on pipeline create/update/upsert for Lamb…
nmadan Apr 7, 2022
7f0ffac
prepare release v2.85.0
Apr 11, 2022
acdf01b
update development version to v2.85.1.dev0
Apr 11, 2022
578a95e
feature: Adds Spark Processing Notebook to Notebook Tests (#3058)
shreyapandit Apr 11, 2022
920d33a
prepare release v2.86.0
Apr 12, 2022
a61285f
update development version to v2.86.1.dev0
Apr 12, 2022
b2243d7
fix: xgboost, sklearn network isolation for jumpstart (#3060)
evakravi Apr 12, 2022
8b071e0
documentation: fix minor typo (#3063)
bonellia Apr 12, 2022
92941a8
prepare release v2.86.1
Apr 13, 2022
b29a43f
update development version to v2.86.2.dev0
Apr 13, 2022
d7a942d
#using uuid to randomize, otherwise system timestamp is used (#3046)
Apr 13, 2022
d42765c
prepare release v2.86.2
Apr 14, 2022
835a11c
update development version to v2.86.3.dev0
Apr 14, 2022
05db4bd
feat: add Tensorflow and Pytorch version for SM Training Compiler and…
access2rohit Apr 19, 2022
09e9aaa
fix: Add more logging when unexpected number of artifacts found (#3065)
danabens Apr 19, 2022
02acb53
fix: retry context delete (#2721)
danabens Apr 19, 2022
7965e69
feature: Add Jumpstart example notebooks (#3068)
bencrabtree Apr 19, 2022
da906dc
fix: TrainingStep cache misses due to timestamp based job name (#3070)
nmadan Apr 19, 2022
5de4810
fix: integs for training compiler in non-PDX regions (#3073)
mufaddal-rohawala Apr 19, 2022
18d6b5c
prepare release v2.87.0
Apr 20, 2022
4e2e36a
update development version to v2.87.1.dev0
Apr 20, 2022
c98f03c
feature: jumpstart notebook utils -- list model ids, scripts, tasks, …
evakravi Apr 20, 2022
9455d6e
fix: disable endpoint context test (#3074)
danabens Apr 21, 2022
f1be282
doc: sm model parallel 1.8.0 release notes (#3072)
mchoi8739 Apr 22, 2022
6bd1f4d
fix: local mode printing of credentials during docker login closes #2…
jmahlik Apr 22, 2022
e409141
prepare release v2.88.0
Apr 26, 2022
1ddded9
update development version to v2.88.1.dev0
Apr 26, 2022
cd5974a
fix: Add encryption setting to tar_and_upload_dir method (#3082)
navinsoni Apr 27, 2022
16b5e02
prepare release v2.88.1
Apr 27, 2022
39eb67d
update development version to v2.88.2.dev0
Apr 27, 2022
b744906
Implement subclass compatibility for workflow pipeline job steps (#3040)
jerrypeng7773 Apr 27, 2022
38ce6f7
fix: Automl integ describe job check (#3088)
mufaddal-rohawala Apr 29, 2022
e4ae2ac
prepare release v2.88.2
May 2, 2022
4a6a0bd
update development version to v2.88.3.dev0
May 2, 2022
ae25f59
Feat/jumpstart model table update (#3087)
bencrabtree May 2, 2022
8001ba7
deprecate: Remove deprecated argument s3_data_distribution_type (#3064)
keerthanvasist May 3, 2022
a10efe0
prepare release v2.88.3
May 6, 2022
ee86a54
update development version to v2.88.4.dev0
May 6, 2022
4f4096b
feature: add validation specification (#3075)
BasilBeirouti May 6, 2022
4388782
fix: repack model locally when local_code local mode (#3094)
mufaddal-rohawala May 6, 2022
3694bef
documentation: smdmp 1.8.1 release note (#3085)
mchoi8739 May 6, 2022
16abd93
feature: Add PT 1.11 support (#3097)
saimidu May 10, 2022
44f5d09
prepare release v2.89.0
May 11, 2022
15a78b1
update development version to v2.89.1.dev0
May 11, 2022
500af4c
fix: update setup.py to add minimum python requirement of 3.6 (#3105)
navinsoni May 12, 2022
9014064
feature: Add ModelStep for SageMaker Model Building Pipeline (#3076)
qidewenwhen May 13, 2022
e315711
support estimator output path parameterization
jerrypeng7773 May 13, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .githooks/pre-push
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,5 @@ start_time=`date +%s`
tox -e sphinx,doc8 --parallel all
./ci-scripts/displaytime.sh 'sphinx,doc8' $start_time
start_time=`date +%s`
tox -e py36,py37,py38 --parallel all -- tests/unit
./ci-scripts/displaytime.sh 'py36,py37,py38 unit' $start_time
tox -e py36,py37,py38,py39 --parallel all -- tests/unit
./ci-scripts/displaytime.sh 'py36,py37,py38,py39 unit' $start_time
8 changes: 7 additions & 1 deletion .readthedocs.yml → .readthedocs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,19 @@

version: 2

build:
os: ubuntu-20.04
tools:
python: "3.9"


python:
version: 3.6
install:
- method: pip
path: .
- requirements: doc/requirements.txt


sphinx:
configuration: doc/conf.py
fail_on_warning: true # http://www.sphinx-doc.org/en/master/man/sphinx-build.html#id6
220 changes: 219 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,223 @@
# Changelog

## v2.89.0 (2022-05-11)

### Features

* Add PT 1.11 support
* add validation specification

### Bug Fixes and Other Changes

* repack model locally when local_code local mode

### Documentation Changes

* smdmp 1.8.1 release note

## v2.88.3 (2022-05-06)

### Bug Fixes and Other Changes

* deprecate: Remove deprecated argument s3_data_distribution_type
* Feat/jumpstart model table update

## v2.88.2 (2022-05-02)

### Bug Fixes and Other Changes

* Automl integ describe job check
* Implement subclass compatibility for workflow pipeline job steps

## v2.88.1 (2022-04-27)

### Bug Fixes and Other Changes

* Add encryption setting to tar_and_upload_dir method

## v2.88.0 (2022-04-26)

### Features

* jumpstart notebook utils -- list model ids, scripts, tasks, frameworks

### Bug Fixes and Other Changes

* local mode printing of credentials during docker login closes #2180
* disable endpoint context test

### Documentation Changes

* sm model parallel 1.8.0 release notes

## v2.87.0 (2022-04-20)

### Features

* Add Jumpstart example notebooks
* add Tensorflow and Pytorch version for SM Training Compiler and expand to regular regions

### Bug Fixes and Other Changes

* integs for training compiler in non-PDX regions
* TrainingStep cache misses due to timestamp based job name
* retry context delete
* Add more logging when unexpected number of artifacts found

## v2.86.2 (2022-04-14)

### Bug Fixes and Other Changes

* #using uuid to randomize, otherwise system timestamp is used

## v2.86.1 (2022-04-13)

### Bug Fixes and Other Changes

* xgboost, sklearn network isolation for jumpstart

### Documentation Changes

* fix minor typo

## v2.86.0 (2022-04-12)

### Features

* Adds Spark Processing Notebook to Notebook Tests

## v2.85.0 (2022-04-11)

### Features

* update lambda code on pipeline create/update/upsert for Lamb…
* jumpstart model url
* add serverless inference image_uri retrieve support

### Bug Fixes and Other Changes

* Add back the Fix for Pipeline variables related customer issues
* Support file URIs in ProcessingStep's code parameter

## v2.84.0 (2022-04-07)

### Features

* dependabot integ - move all deps to requirements.txt
* add xgboost framework version 1.5-1

## v2.83.0 (2022-04-04)

### Features

* Hugging Face Transformers 4.17 for TF 2.6

### Bug Fixes and Other Changes

* IOC image version select issue

## v2.82.2 (2022-04-01)

### Bug Fixes and Other Changes

* Revert "fix: Fix Pipeline variables related customer issues (#2959)"
* Refactor repack_model script injection, fixes tar.gz error

## v2.82.1 (2022-03-31)

### Bug Fixes and Other Changes

* Update Inferentia Image URI Config
* Fix Pipeline variables related customer issues
* more logging info for static pipeline test data setup

## v2.82.0 (2022-03-30)

### Features

* pluggable instance fallback mechanism, add CapacityError
* support passing Env Vars to local mode training

## v2.81.1 (2022-03-29)

### Bug Fixes and Other Changes

* Update black-check version, add support for Spark 3.1 Processing

## v2.81.0 (2022-03-26)

### Features

* Retrieve data configuration
* enable EnableInterContainerTrafficEncryption for model monitoring
* Hugging Face Transformers 4.17 for PT 1.10

### Bug Fixes and Other Changes

* remove `new` from serverless
* temporarily skip tests impacted by data inconsistency
* Implement override solution for pipeline variables

### Documentation Changes

* add documentation for image_uri serverless use case
* minor fixes for smddp 1.4.0 doc

## v2.80.0 (2022-03-18)

### Features

* Add support for TF2.7
* Add support for TF 2.8
* TF242 ioc support
* Add support for TF 2.6.3
* Support for remote docker host
* AutoGluon 0.3.2 and 0.4.0 image_uris

### Bug Fixes and Other Changes

* Align max_wait definitions in EstimaorBase and Estimator
* Add JumpStart model table build notification
* gpu integs CapacityError - fallback to available compute
* gpu integs CapacityError - fallback to available compute
* jumpstart docs network isolation

### Documentation Changes

* sagemaker distributed model parallel 1.7.0 doc

## v2.79.0 (2022-03-16)

### Features

* Inferentia Neuron support for HuggingFace
* custom base job name for jumpstart models/estimators
* Python 3.9 for readthedocs

### Bug Fixes and Other Changes

* container env generation for S3 URI and add test for the same

### Documentation Changes

* the SageMaker distributed data parallel v1.4.0 release
* update sagemaker training compiler docstring
* smddp doc update

## v2.78.0 (2022-03-07)

### Features

* TensorFlow 2.4 for Neo
* Data Serializer

### Bug Fixes and Other Changes

* Style update in DataSerializer
* Remove sagemaker_job_name from hyperparameters in TrainingStep
* reorganize test files for workflow
* update code to get commit_id in codepipeline

## v2.77.1 (2022-02-25)

### Bug Fixes and Other Changes
Expand All @@ -11,7 +229,7 @@
### Features

* override jumpstart content bucket
* jumpstart model id suggestions
* jumpstart model ID suggestions
* adding customer metadata support to registermodel step

### Bug Fixes and Other Changes
Expand Down
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
recursive-include src/sagemaker *.py

include src/sagemaker/image_uri_config/*.json
recursive-include requirements *

include VERSION
include LICENSE.txt
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
2.77.2.dev0
2.89.1.dev0
2 changes: 0 additions & 2 deletions doc/_static/js/analytics.js

This file was deleted.

4 changes: 4 additions & 0 deletions doc/_static/js/datatable.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
$(document).ready( function () {
$('table.datatable').DataTable();
$('a.external').attr('target', '_blank');
} );
24 changes: 22 additions & 2 deletions doc/api/training/distributed.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,28 @@ SageMaker distributed training libraries offer both data parallel and model para
They combine software and hardware technologies to improve inter-GPU and inter-node communications.
They extend SageMaker’s training capabilities with built-in options that require only small code changes to your training scripts.

.. _sdp_api_docs_toc:

The SageMaker Distributed Data Parallel Library
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. toctree::
:maxdepth: 2

smd_data_parallel
sdp_versions/latest
smd_data_parallel_use_sm_pysdk
smd_data_parallel_release_notes/smd_data_parallel_change_log

.. _smp_api_docs_toc:

The SageMaker Distributed Model Parallel Library
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. toctree::
:maxdepth: 3
:maxdepth: 2

smd_data_parallel
smd_model_parallel
smp_versions/latest
smd_model_parallel_general
smd_model_parallel_release_notes/smd_model_parallel_change_log
8 changes: 8 additions & 0 deletions doc/api/training/sdp_versions/archives.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
.. _smddp-version-archive:

.. toctree::
:maxdepth: 1

v1_2_x.rst
v1_1_x.rst
v1_0_0.rst
44 changes: 41 additions & 3 deletions doc/api/training/sdp_versions/latest.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,47 @@
.. _sdp_api_docs:

Version 1.2.x (Latest)
#############################################
Use the Library to Adapt Your Training Script
#############################################

This section contains the SageMaker distributed data parallel API documentation.
If you are a new user of this library, it is recommended you use this guide alongside
`SageMaker's Distributed Data Parallel Library
<https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel.html>`_.

The library provides framework-specific APIs for TensorFlow and PyTorch.

Select the latest or one of the previous versions of the API documentation
depending on the version of the library you use.

.. important::

The distributed data parallel library supports training jobs using CUDA 11 or later.
When you define a :class:`sagemaker.tensorflow.estimator.TensorFlow` or
:class:`sagemaker.pytorch.estimator.PyTorch`
estimator with the data parallel library enabled,
SageMaker uses CUDA 11. When you extend or customize your own training image,
you must use a base image with CUDA 11 or later. See
`SageMaker Python SDK's distributed data parallel library APIs
<https://docs.aws.amazon.com/sagemaker/latest/dg/data-parallel-use-api.html#data-parallel-use-python-skd-api>`_
for more information.

Version 1.4.0 (Latest)
======================

.. toctree::
:maxdepth: 1

latest/smd_data_parallel_pytorch.rst
latest/smd_data_parallel_tensorflow.rst
latest/smd_data_parallel_pytorch
latest/smd_data_parallel_tensorflow

Documentation Archive
=====================

To find the API documentation for the previous versions of the library,
choose one of the following:

.. toctree::
:maxdepth: 1

archives
Loading