Skip to content

Commit ea36e5e

Browse files
authored
Merge branch 'master' into atqy-unified-search
2 parents ac50737 + 0e9c10e commit ea36e5e

File tree

108 files changed

+7321
-373
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

108 files changed

+7321
-373
lines changed

CHANGELOG.md

+24
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,29 @@
11
# Changelog
22

3+
## v2.104.0 (2022-08-17)
4+
5+
### Features
6+
7+
* local mode executor implementation
8+
* Pipelines local mode setup
9+
* Add PT 1.12 support
10+
* added _AnalysisConfigGenerator for clarify
11+
12+
### Bug Fixes and Other Changes
13+
14+
* yaml safe_load sagemaker config
15+
* pipelines local mode minor bug fixes
16+
* add local mode integ tests
17+
* implement local JsonGet function
18+
* Add Pipeline annotation in model base class and tensorflow estimator
19+
* Allow users to customize trial component display names for pipeline launched jobs
20+
* Update localmode code to decode urllib response as UTF8
21+
22+
### Documentation Changes
23+
24+
* New content for Pipelines local mode
25+
* Correct documentation error
26+
327
## v2.103.0 (2022-08-05)
428

529
### Features

VERSION

+1-1
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.103.1.dev0
1+
2.104.1.dev0

doc/Makefile

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
# You can set these variables from the command line.
55
SPHINXOPTS = -W
6-
SPHINXBUILD = python -msphinx
6+
SPHINXBUILD = python3 -msphinx
77
SPHINXPROJ = sagemaker
88
SOURCEDIR = .
99
BUILDDIR = _build

doc/algorithms/index.rst

+7-12
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,15 @@
11
######################
2-
First-Party Algorithms
2+
Built-in Algorithms
33
######################
44

55
Amazon SageMaker provides implementations of some common machine learning algorithms optimized for GPU architecture and massive datasets.
66

77
.. toctree::
88
:maxdepth: 2
99

10-
sagemaker.amazon.amazon_estimator
11-
factorization_machines
12-
ipinsights
13-
kmeans
14-
knn
15-
lda
16-
linear_learner
17-
ntm
18-
object2vec
19-
pca
20-
randomcutforest
10+
tabular/index
11+
text/index
12+
time_series/index
13+
unsupervised/index
14+
vision/index
15+
other/index

doc/algorithms/other/index.rst

+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
######################
2+
Other
3+
######################
4+
5+
:ref:`All Pre-trained Models <all-pretrained-models>`
6+
7+
.. toctree::
8+
:maxdepth: 2
9+
10+
sagemaker.amazon.amazon_estimator

doc/algorithms/tabular/autogluon.rst

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
############
2+
AutoGluon
3+
############
4+
5+
`AutoGluon-Tabular <https://auto.gluon.ai/stable/index.html>`__ is a popular open-source AutoML framework that trains highly accurate machine learning models on an unprocessed tabular dataset.
6+
Unlike existing AutoML frameworks that primarily focus on model and hyperparameter selection, AutoGluon-Tabular succeeds by ensembling multiple models and stacking them in multiple layers.
7+
8+
9+
The following table outlines a variety of sample notebooks that address different use cases of Amazon SageMaker AutoGluon-Tabular algorithm.
10+
11+
.. list-table::
12+
:widths: 25 25
13+
:header-rows: 1
14+
15+
* - Notebook Title
16+
- Description
17+
* - `Tabular classification with Amazon SageMaker AutoGluon-Tabular algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/autogluon_tabular/Amazon_Tabular_Classification_AutoGluon.ipynb>`__
18+
- This notebook demonstrates the use of the Amazon SageMaker AutoGluon-Tabular algorithm to train and host a tabular classification model.
19+
* - `Tabular regression with Amazon SageMaker AutoGluon-Tabular algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/autogluon_tabular/Amazon_Tabular_Regression_AutoGluon.ipynb>`__
20+
- This notebook demonstrates the use of the Amazon SageMaker AutoGluon-Tabular algorithm to train and host a tabular regression model.
21+
22+
23+
For instructions on how to create and access Jupyter notebook instances that you can use to run the example in SageMaker, see
24+
`Use Amazon SageMaker Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html>`__. After you have created a notebook
25+
instance and opened it, choose the SageMaker Examples tab to see a list of all of the SageMaker samples. To open a notebook, choose its
26+
Use tab and choose Create copy.
27+
28+
For detailed documentation, please refer to the `Sagemaker AutoGluon-Tabular Algorithm <https://docs.aws.amazon.com/sagemaker/latest/dg/autogluon-tabular.html>`__.

doc/algorithms/tabular/catboost.rst

+37
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
############
2+
CatBoost
3+
############
4+
5+
6+
`CatBoost <https://catboost.ai/>`__ is a popular and high-performance open-source implementation of the Gradient Boosting Decision Tree (GBDT)
7+
algorithm. GBDT is a supervised learning algorithm that attempts to accurately predict a target variable by combining an ensemble of
8+
estimates from a set of simpler and weaker models.
9+
10+
CatBoost introduces two critical algorithmic advances to GBDT:
11+
12+
* The implementation of ordered boosting, a permutation-driven alternative to the classic algorithm
13+
14+
* An innovative algorithm for processing categorical features
15+
16+
Both techniques were created to fight a prediction shift caused by a special kind of target leakage present in all currently existing
17+
implementations of gradient boosting algorithms.
18+
19+
The following table outlines a variety of sample notebooks that address different use cases of Amazon SageMaker CatBoost algorithm.
20+
21+
.. list-table::
22+
:widths: 25 25
23+
:header-rows: 1
24+
25+
* - Notebook Title
26+
- Description
27+
* - `Tabular classification with Amazon SageMaker LightGBM and CatBoost algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/lightgbm_catboost_tabular/Amazon_Tabular_Classification_LightGBM_CatBoost.ipynb>`__
28+
- This notebook demonstrates the use of the Amazon SageMaker CatBoost algorithm to train and host a tabular classification model.
29+
* - `Tabular regression with Amazon SageMaker LightGBM and CatBoost algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/lightgbm_catboost_tabular/Amazon_Tabular_Regression_LightGBM_CatBoost.ipynb>`__
30+
- This notebook demonstrates the use of the Amazon SageMaker CatBoost algorithm to train and host a tabular regression model.
31+
32+
For instructions on how to create and access Jupyter notebook instances that you can use to run the example in SageMaker, see
33+
`Use Amazon SageMaker Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html>`__. After you have created a notebook
34+
instance and opened it, choose the SageMaker Examples tab to see a list of all of the SageMaker samples. To open a notebook, choose its
35+
Use tab and choose Create copy.
36+
37+
For detailed documentation, please refer to the `Sagemaker CatBoost Algorithm <https://docs.aws.amazon.com/sagemaker/latest/dg/catboost.html>`__.

doc/algorithms/factorization_machines.rst renamed to doc/algorithms/tabular/factorization_machines.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
FactorizationMachines
1+
Factorization Machines
22
-------------------------
33

44
The Amazon SageMaker Factorization Machines algorithm.

doc/algorithms/tabular/index.rst

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
######################
2+
Tabular
3+
######################
4+
5+
Amazon SageMaker provides built-in algorithms that are tailored to the analysis of tabular data. The built-in SageMaker algorithms for tabular data can be used for either classification or regression problems.
6+
7+
.. toctree::
8+
:maxdepth: 2
9+
10+
autogluon
11+
catboost
12+
factorization_machines
13+
knn
14+
lightgbm
15+
linear_learner
16+
tabtransformer
17+
xgboost
18+
object2vec
File renamed without changes.

doc/algorithms/tabular/lightgbm.rst

+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
############
2+
LightGBM
3+
############
4+
5+
`LightGBM <https://lightgbm.readthedocs.io/en/latest/>`__ is a popular and efficient open-source implementation of the Gradient Boosting
6+
Decision Tree (GBDT) algorithm. GBDT is a supervised learning algorithm that attempts to accurately predict a target variable by
7+
combining an ensemble of estimates from a set of simpler and weaker models. LightGBM uses additional techniques to significantly improve
8+
the efficiency and scalability of conventional GBDT.
9+
10+
The following table outlines a variety of sample notebooks that address different use cases of Amazon SageMaker LightGBM algorithm.
11+
12+
.. list-table::
13+
:widths: 25 25
14+
:header-rows: 1
15+
16+
* - Notebook Title
17+
- Description
18+
* - `Tabular classification with Amazon SageMaker LightGBM and CatBoost algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/lightgbm_catboost_tabular/Amazon_Tabular_Classification_LightGBM_CatBoost.ipynb>`__
19+
- This notebook demonstrates the use of the Amazon SageMaker LightGBM algorithm to train and host a tabular classification model.
20+
* - `Tabular regression with Amazon SageMaker LightGBM and CatBoost algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/lightgbm_catboost_tabular/Amazon_Tabular_Regression_LightGBM_CatBoost.ipynb>`__
21+
- This notebook demonstrates the use of the Amazon SageMaker LightGBM algorithm to train and host a tabular regression model.
22+
23+
For instructions on how to create and access Jupyter notebook instances that you can use to run the example in SageMaker, see
24+
`Use Amazon SageMaker Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html>`__. After you have created a notebook
25+
instance and opened it, choose the SageMaker Examples tab to see a list of all of the SageMaker samples. To open a notebook, choose its
26+
Use tab and choose Create copy.
27+
28+
For detailed documentation, please refer to the `Sagemaker LightGBM Algorithm <https://docs.aws.amazon.com/sagemaker/latest/dg/lightgbm.html>`__.
File renamed without changes.
+28
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
###############
2+
TabTransformer
3+
###############
4+
5+
`TabTransformer <https://arxiv.org/abs/2012.06678>`__ is a novel deep tabular data modeling architecture for supervised learning. The TabTransformer architecture is built on self-attention-based Transformers.
6+
The Transformer layers transform the embeddings of categorical features into robust contextual embeddings to achieve higher prediction accuracy. Furthermore, the contextual embeddings learned from TabTransformer
7+
are highly robust against both missing and noisy data features, and provide better interpretability.
8+
9+
10+
The following table outlines a variety of sample notebooks that address different use cases of Amazon SageMaker TabTransformer algorithm.
11+
12+
.. list-table::
13+
:widths: 25 25
14+
:header-rows: 1
15+
16+
* - Notebook Title
17+
- Description
18+
* - `Tabular classification with Amazon SageMaker TabTransformer algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/tabtransformer_tabular/Amazon_Tabular_Classification_TabTransformer.ipynb>`__
19+
- This notebook demonstrates the use of the Amazon SageMaker TabTransformer algorithm to train and host a tabular classification model.
20+
* - `Tabular regression with Amazon SageMaker TabTransformer algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/tabtransformer_tabular/Amazon_Tabular_Regression_TabTransformer.ipynb>`__
21+
- This notebook demonstrates the use of the Amazon SageMaker TabTransformer algorithm to train and host a tabular regression model.
22+
23+
For instructions on how to create and access Jupyter notebook instances that you can use to run the example in SageMaker, see
24+
`Use Amazon SageMaker Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html>`__. After you have created a notebook
25+
instance and opened it, choose the SageMaker Examples tab to see a list of all of the SageMaker samples. To open a notebook, choose its
26+
Use tab and choose Create copy.
27+
28+
For detailed documentation, please refer to the `Sagemaker TabTransformer Algorithm <https://docs.aws.amazon.com/sagemaker/latest/dg/tabtransformer.html>`__.

doc/algorithms/tabular/xgboost.rst

+40
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
############
2+
XGBoost
3+
############
4+
5+
The `XGBoost <https://github.com/dmlc/xgboost>`__ (eXtreme Gradient Boosting) is a popular and efficient open-source implementation of the gradient boosted trees algorithm. Gradient boosting is a supervised learning algorithm that attempts to accurately predict a target variable
6+
by combining an ensemble of estimates from a set of simpler and weaker models. The XGBoost algorithm performs well in machine learning competitions because of its robust handling of a variety of data types, relationships, distributions, and the variety of hyperparameters that you can
7+
fine-tune. You can use XGBoost for regression, classification (binary and multiclass), and ranking problems.
8+
9+
You can use the new release of the XGBoost algorithm either as a Amazon SageMaker built-in algorithm or as a framework to run training scripts in your local environments. This implementation has a smaller memory footprint, better logging, improved hyperparameter validation, and
10+
an expanded set of metrics than the original versions. It provides an XGBoost estimator that executes a training script in a managed XGBoost environment. The current release of SageMaker XGBoost is based on the original XGBoost versions 1.0, 1.2, 1.3, and 1.5.
11+
12+
The following table outlines a variety of sample notebooks that address different use cases of Amazon SageMaker XGBoost algorithm.
13+
14+
.. list-table::
15+
:widths: 25 25
16+
:header-rows: 1
17+
18+
* - Notebook Title
19+
- Description
20+
* - `How to Create a Custom XGBoost container? <https://sagemaker-examples.readthedocs.io/en/latest/aws_sagemaker_studio/sagemaker_studio_image_build/xgboost_bring_your_own/Batch_Transform_BYO_XGB.html>`__
21+
- This notebook shows you how to build a custom XGBoost Container with Amazon SageMaker Batch Transform.
22+
* - `Regression with XGBoost using Parquet <https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/xgboost_abalone/xgboost_parquet_input_training.html>`__
23+
- This notebook shows you how to use the Abalone dataset in Parquet to train a XGBoost model.
24+
* - `How to Train and Host a Multiclass Classification Model? <https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/xgboost_mnist/xgboost_mnist.html>`__
25+
- This notebook shows how to use the MNIST dataset to train and host a multiclass classification model.
26+
* - `How to train a Model for Customer Churn Prediction? <https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_applying_machine_learning/xgboost_customer_churn/xgboost_customer_churn.html>`__
27+
- This notebook shows you how to train a model to Predict Mobile Customer Departure in an effort to identify unhappy customers.
28+
* - `An Introduction to Amazon SageMaker Managed Spot infrastructure for XGBoost Training <https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/xgboost_abalone/xgboost_managed_spot_training.html>`__
29+
- This notebook shows you how to use Spot Instances for training with a XGBoost Container.
30+
* - `How to use Amazon SageMaker Debugger to debug XGBoost Training Jobs? <https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-debugger/xgboost_builtin_rules/xgboost-regression-debugger-rules.html>`__
31+
- This notebook shows you how to use Amazon SageMaker Debugger to monitor training jobs to detect inconsistencies.
32+
* - `How to use Amazon SageMaker Debugger to debug XGBoost Training Jobs in Real-Time? <https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-debugger/xgboost_realtime_analysis/xgboost-realtime-analysis.html>`__
33+
- This notebook shows you how to use the MNIST dataset and Amazon SageMaker Debugger to perform real-time analysis of XGBoost training jobs while training jobs are running.
34+
35+
For instructions on how to create and access Jupyter notebook instances that you can use to run the example in SageMaker, see
36+
`Use Amazon SageMaker Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html>`__. After you have created a notebook
37+
instance and opened it, choose the SageMaker Examples tab to see a list of all of the SageMaker samples. To open a notebook, choose its
38+
Use tab and choose Create copy.
39+
40+
For detailed documentation, please refer to the `Sagemaker XGBoost Algorithm <https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html>`__.

doc/algorithms/text/blazing_text.rst

+27
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
#############
2+
Blazing Text
3+
#############
4+
5+
6+
The Amazon SageMaker BlazingText algorithm provides highly optimized implementations of the Word2vec and text classification algorithms. The Word2vec algorithm is useful for many downstream natural language processing (NLP)
7+
tasks, such as sentiment analysis, named entity recognition, machine translation, etc. Text classification is an important task for applications that perform web searches, information retrieval, ranking, and document classification.
8+
9+
The Word2vec algorithm maps words to high-quality distributed vectors. The resulting vector representation of a word is called a word embedding. Words that are semantically similar correspond to vectors that are close together.
10+
That way, word embeddings capture the semantic relationships between words.
11+
12+
Many natural language processing (NLP) applications learn word embeddings by training on large collections of documents. These pretrained vector representations provide information about semantics and word distributions that
13+
typically improves the generalizability of other models that are later trained on a more limited amount of data. Most implementations of the Word2vec algorithm are not optimized for multi-core CPU architectures. This makes it
14+
difficult to scale to large datasets.
15+
16+
With the BlazingText algorithm, you can scale to large datasets easily. Similar to Word2vec, it provides the Skip-gram and continuous bag-of-words (CBOW) training architectures. BlazingText's implementation of the supervised
17+
multi-class, multi-label text classification algorithm extends the fastText text classifier to use GPU acceleration with custom `CUDA <https://docs.nvidia.com/cuda/index.html>`__
18+
19+
kernels. You can train a model on more than a billion words in a couple of minutes using a multi-core CPU or a GPU. And, you achieve performance on par with the state-of-the-art deep learning text classification algorithms.
20+
21+
The BlazingText algorithm is not parallelizable. For more information on parameters related to training, see `Docker Registry Paths for SageMaker Built-in Algorithms <https://docs.aws.amazon.com/en_us/sagemaker/latest/dg/sagemaker-algo-docker-registry-paths.html>`__.
22+
23+
For a sample notebook that uses the SageMaker BlazingText algorithm to train and deploy supervised binary and multiclass classification models, see
24+
`Blazing Text classification on the DBPedia dataset <https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/blazingtext_text_classification_dbpedia/blazingtext_text_classification_dbpedia.html>`__.
25+
For instructions for creating and accessing Jupyter notebook instances that you can use to run the example in SageMaker, see `Use Amazon SageMaker Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html>`__.
26+
After creating and opening a notebook instance, choose the SageMaker Examples tab to see a list of all the SageMaker examples. The topic modeling example notebooks that use the Blazing Text are located in the Introduction to Amazon
27+
algorithms section. To open a notebook, choose its Use tab, then choose Create copy.

doc/algorithms/text/index.rst

+22
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
######################
2+
Text
3+
######################
4+
5+
Amazon SageMaker provides algorithms that are tailored to the analysis of textual documents used in natural language processing, document classification or summarization, topic modeling or classification, and language transcription or translation.
6+
7+
.. toctree::
8+
:maxdepth: 2
9+
10+
blazing_text
11+
lda
12+
ntm
13+
sequence_to_sequence
14+
text_classification_tensorflow
15+
sentence_pair_classification_tensorflow
16+
sentence_pair_classification_hugging_face
17+
question_answering_pytorch
18+
named_entity_recognition_hugging_face
19+
text_summarization_hugging_face
20+
text_generation_hugging_face
21+
machine_translation_hugging_face
22+
text_embedding_tensorflow_mxnet
File renamed without changes.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
#####################################
2+
Machine Translation - HuggingFace
3+
#####################################
4+
5+
6+
This is a supervised machine translation algorithm which supports many pre-trained models available in Hugging Face. The following
7+
`sample notebook <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart_machine_translation/Amazon_JumpStart_Machine_Translation.ipynb>`__
8+
demonstrates how to use the Sagemaker Python SDK for Machine Translation for using these algorithms.
9+
10+
For detailed documentation please refer :ref:`Use Built-in Algorithms with Pre-trained Models in SageMaker Python SDK <built-in-algos>`.

0 commit comments

Comments
 (0)