Skip to content

Commit 8f547bd

Browse files
Merge branch 'master' into shreyapandit-upgrade-PyYAML
2 parents ae6d2ab + 0e9c10e commit 8f547bd

File tree

113 files changed

+7496
-468
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

113 files changed

+7496
-468
lines changed

CHANGELOG.md

+56
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,61 @@
11
# Changelog
22

3+
## v2.104.0 (2022-08-17)
4+
5+
### Features
6+
7+
* local mode executor implementation
8+
* Pipelines local mode setup
9+
* Add PT 1.12 support
10+
* added _AnalysisConfigGenerator for clarify
11+
12+
### Bug Fixes and Other Changes
13+
14+
* yaml safe_load sagemaker config
15+
* pipelines local mode minor bug fixes
16+
* add local mode integ tests
17+
* implement local JsonGet function
18+
* Add Pipeline annotation in model base class and tensorflow estimator
19+
* Allow users to customize trial component display names for pipeline launched jobs
20+
* Update localmode code to decode urllib response as UTF8
21+
22+
### Documentation Changes
23+
24+
* New content for Pipelines local mode
25+
* Correct documentation error
26+
27+
## v2.103.0 (2022-08-05)
28+
29+
### Features
30+
31+
* AutoGluon 0.4.3 and 0.5.2 image_uris
32+
33+
### Bug Fixes and Other Changes
34+
35+
* Revert "change: add a check to prevent launching a modelparallel job on CPU only instances"
36+
* Add gpu capability to local
37+
* Link PyTorch 1.11 to 1.11.0
38+
39+
## v2.102.0 (2022-08-04)
40+
41+
### Features
42+
43+
* add warnings for xgboost specific rules in debugger rules
44+
* Add PyTorch DDP distribution support
45+
* Add test for profiler enablement with debugger_hook false
46+
47+
### Bug Fixes and Other Changes
48+
49+
* Two letter language code must be supported
50+
* add a check to prevent launching a modelparallel job on CPU only instances
51+
* Allow StepCollection added in ConditionStep to be depended on
52+
* Add PipelineVariable annotation in framework models
53+
* skip managed spot training mxnet nb
54+
55+
### Documentation Changes
56+
57+
* smdistributed libraries currency updates
58+
359
## v2.101.1 (2022-07-28)
460

561
### Bug Fixes and Other Changes

VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.101.2.dev0
1+
2.104.1.dev0

doc/Makefile

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
# You can set these variables from the command line.
55
SPHINXOPTS = -W
6-
SPHINXBUILD = python -msphinx
6+
SPHINXBUILD = python3 -msphinx
77
SPHINXPROJ = sagemaker
88
SOURCEDIR = .
99
BUILDDIR = _build

doc/algorithms/index.rst

+7-12
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,15 @@
11
######################
2-
First-Party Algorithms
2+
Built-in Algorithms
33
######################
44

55
Amazon SageMaker provides implementations of some common machine learning algorithms optimized for GPU architecture and massive datasets.
66

77
.. toctree::
88
:maxdepth: 2
99

10-
sagemaker.amazon.amazon_estimator
11-
factorization_machines
12-
ipinsights
13-
kmeans
14-
knn
15-
lda
16-
linear_learner
17-
ntm
18-
object2vec
19-
pca
20-
randomcutforest
10+
tabular/index
11+
text/index
12+
time_series/index
13+
unsupervised/index
14+
vision/index
15+
other/index

doc/algorithms/other/index.rst

+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
######################
2+
Other
3+
######################
4+
5+
:ref:`All Pre-trained Models <all-pretrained-models>`
6+
7+
.. toctree::
8+
:maxdepth: 2
9+
10+
sagemaker.amazon.amazon_estimator

doc/algorithms/tabular/autogluon.rst

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
############
2+
AutoGluon
3+
############
4+
5+
`AutoGluon-Tabular <https://auto.gluon.ai/stable/index.html>`__ is a popular open-source AutoML framework that trains highly accurate machine learning models on an unprocessed tabular dataset.
6+
Unlike existing AutoML frameworks that primarily focus on model and hyperparameter selection, AutoGluon-Tabular succeeds by ensembling multiple models and stacking them in multiple layers.
7+
8+
9+
The following table outlines a variety of sample notebooks that address different use cases of Amazon SageMaker AutoGluon-Tabular algorithm.
10+
11+
.. list-table::
12+
:widths: 25 25
13+
:header-rows: 1
14+
15+
* - Notebook Title
16+
- Description
17+
* - `Tabular classification with Amazon SageMaker AutoGluon-Tabular algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/autogluon_tabular/Amazon_Tabular_Classification_AutoGluon.ipynb>`__
18+
- This notebook demonstrates the use of the Amazon SageMaker AutoGluon-Tabular algorithm to train and host a tabular classification model.
19+
* - `Tabular regression with Amazon SageMaker AutoGluon-Tabular algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/autogluon_tabular/Amazon_Tabular_Regression_AutoGluon.ipynb>`__
20+
- This notebook demonstrates the use of the Amazon SageMaker AutoGluon-Tabular algorithm to train and host a tabular regression model.
21+
22+
23+
For instructions on how to create and access Jupyter notebook instances that you can use to run the example in SageMaker, see
24+
`Use Amazon SageMaker Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html>`__. After you have created a notebook
25+
instance and opened it, choose the SageMaker Examples tab to see a list of all of the SageMaker samples. To open a notebook, choose its
26+
Use tab and choose Create copy.
27+
28+
For detailed documentation, please refer to the `Sagemaker AutoGluon-Tabular Algorithm <https://docs.aws.amazon.com/sagemaker/latest/dg/autogluon-tabular.html>`__.

doc/algorithms/tabular/catboost.rst

+37
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
############
2+
CatBoost
3+
############
4+
5+
6+
`CatBoost <https://catboost.ai/>`__ is a popular and high-performance open-source implementation of the Gradient Boosting Decision Tree (GBDT)
7+
algorithm. GBDT is a supervised learning algorithm that attempts to accurately predict a target variable by combining an ensemble of
8+
estimates from a set of simpler and weaker models.
9+
10+
CatBoost introduces two critical algorithmic advances to GBDT:
11+
12+
* The implementation of ordered boosting, a permutation-driven alternative to the classic algorithm
13+
14+
* An innovative algorithm for processing categorical features
15+
16+
Both techniques were created to fight a prediction shift caused by a special kind of target leakage present in all currently existing
17+
implementations of gradient boosting algorithms.
18+
19+
The following table outlines a variety of sample notebooks that address different use cases of Amazon SageMaker CatBoost algorithm.
20+
21+
.. list-table::
22+
:widths: 25 25
23+
:header-rows: 1
24+
25+
* - Notebook Title
26+
- Description
27+
* - `Tabular classification with Amazon SageMaker LightGBM and CatBoost algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/lightgbm_catboost_tabular/Amazon_Tabular_Classification_LightGBM_CatBoost.ipynb>`__
28+
- This notebook demonstrates the use of the Amazon SageMaker CatBoost algorithm to train and host a tabular classification model.
29+
* - `Tabular regression with Amazon SageMaker LightGBM and CatBoost algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/lightgbm_catboost_tabular/Amazon_Tabular_Regression_LightGBM_CatBoost.ipynb>`__
30+
- This notebook demonstrates the use of the Amazon SageMaker CatBoost algorithm to train and host a tabular regression model.
31+
32+
For instructions on how to create and access Jupyter notebook instances that you can use to run the example in SageMaker, see
33+
`Use Amazon SageMaker Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html>`__. After you have created a notebook
34+
instance and opened it, choose the SageMaker Examples tab to see a list of all of the SageMaker samples. To open a notebook, choose its
35+
Use tab and choose Create copy.
36+
37+
For detailed documentation, please refer to the `Sagemaker CatBoost Algorithm <https://docs.aws.amazon.com/sagemaker/latest/dg/catboost.html>`__.

doc/algorithms/factorization_machines.rst renamed to doc/algorithms/tabular/factorization_machines.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
FactorizationMachines
1+
Factorization Machines
22
-------------------------
33

44
The Amazon SageMaker Factorization Machines algorithm.

doc/algorithms/tabular/index.rst

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
######################
2+
Tabular
3+
######################
4+
5+
Amazon SageMaker provides built-in algorithms that are tailored to the analysis of tabular data. The built-in SageMaker algorithms for tabular data can be used for either classification or regression problems.
6+
7+
.. toctree::
8+
:maxdepth: 2
9+
10+
autogluon
11+
catboost
12+
factorization_machines
13+
knn
14+
lightgbm
15+
linear_learner
16+
tabtransformer
17+
xgboost
18+
object2vec
File renamed without changes.

doc/algorithms/tabular/lightgbm.rst

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
############
2+
LightGBM
3+
############
4+
5+
`LightGBM <https://lightgbm.readthedocs.io/en/latest/>`__ is a popular and efficient open-source implementation of the Gradient Boosting
6+
Decision Tree (GBDT) algorithm. GBDT is a supervised learning algorithm that attempts to accurately predict a target variable by
7+
combining an ensemble of estimates from a set of simpler and weaker models. LightGBM uses additional techniques to significantly improve
8+
the efficiency and scalability of conventional GBDT.
9+
10+
The following table outlines a variety of sample notebooks that address different use cases of Amazon SageMaker LightGBM algorithm.
11+
12+
.. list-table::
13+
:widths: 25 25
14+
:header-rows: 1
15+
16+
* - Notebook Title
17+
- Description
18+
* - `Tabular classification with Amazon SageMaker LightGBM and CatBoost algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/lightgbm_catboost_tabular/Amazon_Tabular_Classification_LightGBM_CatBoost.ipynb>`__
19+
- This notebook demonstrates the use of the Amazon SageMaker LightGBM algorithm to train and host a tabular classification model.
20+
* - `Tabular regression with Amazon SageMaker LightGBM and CatBoost algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/lightgbm_catboost_tabular/Amazon_Tabular_Regression_LightGBM_CatBoost.ipynb>`__
21+
- This notebook demonstrates the use of the Amazon SageMaker LightGBM algorithm to train and host a tabular regression model.
22+
23+
For instructions on how to create and access Jupyter notebook instances that you can use to run the example in SageMaker, see
24+
`Use Amazon SageMaker Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html>`__. After you have created a notebook
25+
instance and opened it, choose the SageMaker Examples tab to see a list of all of the SageMaker samples. To open a notebook, choose its
26+
Use tab and choose Create copy.
27+
28+
For detailed documentation, please refer to the `Sagemaker LightGBM Algorithm <https://docs.aws.amazon.com/sagemaker/latest/dg/lightgbm.html>`__.
File renamed without changes.
+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
###############
2+
TabTransformer
3+
###############
4+
5+
`TabTransformer <https://arxiv.org/abs/2012.06678>`__ is a novel deep tabular data modeling architecture for supervised learning. The TabTransformer architecture is built on self-attention-based Transformers.
6+
The Transformer layers transform the embeddings of categorical features into robust contextual embeddings to achieve higher prediction accuracy. Furthermore, the contextual embeddings learned from TabTransformer
7+
are highly robust against both missing and noisy data features, and provide better interpretability.
8+
9+
10+
The following table outlines a variety of sample notebooks that address different use cases of Amazon SageMaker TabTransformer algorithm.
11+
12+
.. list-table::
13+
:widths: 25 25
14+
:header-rows: 1
15+
16+
* - Notebook Title
17+
- Description
18+
* - `Tabular classification with Amazon SageMaker TabTransformer algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/tabtransformer_tabular/Amazon_Tabular_Classification_TabTransformer.ipynb>`__
19+
- This notebook demonstrates the use of the Amazon SageMaker TabTransformer algorithm to train and host a tabular classification model.
20+
* - `Tabular regression with Amazon SageMaker TabTransformer algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/tabtransformer_tabular/Amazon_Tabular_Regression_TabTransformer.ipynb>`__
21+
- This notebook demonstrates the use of the Amazon SageMaker TabTransformer algorithm to train and host a tabular regression model.
22+
23+
For instructions on how to create and access Jupyter notebook instances that you can use to run the example in SageMaker, see
24+
`Use Amazon SageMaker Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html>`__. After you have created a notebook
25+
instance and opened it, choose the SageMaker Examples tab to see a list of all of the SageMaker samples. To open a notebook, choose its
26+
Use tab and choose Create copy.
27+
28+
For detailed documentation, please refer to the `Sagemaker TabTransformer Algorithm <https://docs.aws.amazon.com/sagemaker/latest/dg/tabtransformer.html>`__.

doc/algorithms/tabular/xgboost.rst

+40
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
############
2+
XGBoost
3+
############
4+
5+
The `XGBoost <https://github.com/dmlc/xgboost>`__ (eXtreme Gradient Boosting) is a popular and efficient open-source implementation of the gradient boosted trees algorithm. Gradient boosting is a supervised learning algorithm that attempts to accurately predict a target variable
6+
by combining an ensemble of estimates from a set of simpler and weaker models. The XGBoost algorithm performs well in machine learning competitions because of its robust handling of a variety of data types, relationships, distributions, and the variety of hyperparameters that you can
7+
fine-tune. You can use XGBoost for regression, classification (binary and multiclass), and ranking problems.
8+
9+
You can use the new release of the XGBoost algorithm either as a Amazon SageMaker built-in algorithm or as a framework to run training scripts in your local environments. This implementation has a smaller memory footprint, better logging, improved hyperparameter validation, and
10+
an expanded set of metrics than the original versions. It provides an XGBoost estimator that executes a training script in a managed XGBoost environment. The current release of SageMaker XGBoost is based on the original XGBoost versions 1.0, 1.2, 1.3, and 1.5.
11+
12+
The following table outlines a variety of sample notebooks that address different use cases of Amazon SageMaker XGBoost algorithm.
13+
14+
.. list-table::
15+
:widths: 25 25
16+
:header-rows: 1
17+
18+
* - Notebook Title
19+
- Description
20+
* - `How to Create a Custom XGBoost container? <https://sagemaker-examples.readthedocs.io/en/latest/aws_sagemaker_studio/sagemaker_studio_image_build/xgboost_bring_your_own/Batch_Transform_BYO_XGB.html>`__
21+
- This notebook shows you how to build a custom XGBoost Container with Amazon SageMaker Batch Transform.
22+
* - `Regression with XGBoost using Parquet <https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/xgboost_abalone/xgboost_parquet_input_training.html>`__
23+
- This notebook shows you how to use the Abalone dataset in Parquet to train a XGBoost model.
24+
* - `How to Train and Host a Multiclass Classification Model? <https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/xgboost_mnist/xgboost_mnist.html>`__
25+
- This notebook shows how to use the MNIST dataset to train and host a multiclass classification model.
26+
* - `How to train a Model for Customer Churn Prediction? <https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_applying_machine_learning/xgboost_customer_churn/xgboost_customer_churn.html>`__
27+
- This notebook shows you how to train a model to Predict Mobile Customer Departure in an effort to identify unhappy customers.
28+
* - `An Introduction to Amazon SageMaker Managed Spot infrastructure for XGBoost Training <https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/xgboost_abalone/xgboost_managed_spot_training.html>`__
29+
- This notebook shows you how to use Spot Instances for training with a XGBoost Container.
30+
* - `How to use Amazon SageMaker Debugger to debug XGBoost Training Jobs? <https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-debugger/xgboost_builtin_rules/xgboost-regression-debugger-rules.html>`__
31+
- This notebook shows you how to use Amazon SageMaker Debugger to monitor training jobs to detect inconsistencies.
32+
* - `How to use Amazon SageMaker Debugger to debug XGBoost Training Jobs in Real-Time? <https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-debugger/xgboost_realtime_analysis/xgboost-realtime-analysis.html>`__
33+
- This notebook shows you how to use the MNIST dataset and Amazon SageMaker Debugger to perform real-time analysis of XGBoost training jobs while training jobs are running.
34+
35+
For instructions on how to create and access Jupyter notebook instances that you can use to run the example in SageMaker, see
36+
`Use Amazon SageMaker Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html>`__. After you have created a notebook
37+
instance and opened it, choose the SageMaker Examples tab to see a list of all of the SageMaker samples. To open a notebook, choose its
38+
Use tab and choose Create copy.
39+
40+
For detailed documentation, please refer to the `Sagemaker XGBoost Algorithm <https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html>`__.

doc/algorithms/text/blazing_text.rst

+27
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
#############
2+
Blazing Text
3+
#############
4+
5+
6+
The Amazon SageMaker BlazingText algorithm provides highly optimized implementations of the Word2vec and text classification algorithms. The Word2vec algorithm is useful for many downstream natural language processing (NLP)
7+
tasks, such as sentiment analysis, named entity recognition, machine translation, etc. Text classification is an important task for applications that perform web searches, information retrieval, ranking, and document classification.
8+
9+
The Word2vec algorithm maps words to high-quality distributed vectors. The resulting vector representation of a word is called a word embedding. Words that are semantically similar correspond to vectors that are close together.
10+
That way, word embeddings capture the semantic relationships between words.
11+
12+
Many natural language processing (NLP) applications learn word embeddings by training on large collections of documents. These pretrained vector representations provide information about semantics and word distributions that
13+
typically improves the generalizability of other models that are later trained on a more limited amount of data. Most implementations of the Word2vec algorithm are not optimized for multi-core CPU architectures. This makes it
14+
difficult to scale to large datasets.
15+
16+
With the BlazingText algorithm, you can scale to large datasets easily. Similar to Word2vec, it provides the Skip-gram and continuous bag-of-words (CBOW) training architectures. BlazingText's implementation of the supervised
17+
multi-class, multi-label text classification algorithm extends the fastText text classifier to use GPU acceleration with custom `CUDA <https://docs.nvidia.com/cuda/index.html>`__
18+
19+
kernels. You can train a model on more than a billion words in a couple of minutes using a multi-core CPU or a GPU. And, you achieve performance on par with the state-of-the-art deep learning text classification algorithms.
20+
21+
The BlazingText algorithm is not parallelizable. For more information on parameters related to training, see `Docker Registry Paths for SageMaker Built-in Algorithms <https://docs.aws.amazon.com/en_us/sagemaker/latest/dg/sagemaker-algo-docker-registry-paths.html>`__.
22+
23+
For a sample notebook that uses the SageMaker BlazingText algorithm to train and deploy supervised binary and multiclass classification models, see
24+
`Blazing Text classification on the DBPedia dataset <https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/blazingtext_text_classification_dbpedia/blazingtext_text_classification_dbpedia.html>`__.
25+
For instructions for creating and accessing Jupyter notebook instances that you can use to run the example in SageMaker, see `Use Amazon SageMaker Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html>`__.
26+
After creating and opening a notebook instance, choose the SageMaker Examples tab to see a list of all the SageMaker examples. The topic modeling example notebooks that use the Blazing Text are located in the Introduction to Amazon
27+
algorithms section. To open a notebook, choose its Use tab, then choose Create copy.

0 commit comments

Comments
 (0)