Skip to content

documentation: update first-party algorithms and structural updates #3300

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Aug 17, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 7 additions & 12 deletions doc/algorithms/index.rst
Original file line number Diff line number Diff line change
@@ -1,20 +1,15 @@
######################
First-Party Algorithms
Built-in Algorithms
######################
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure you need the #'s to be the same length as the text. That is probably one of the reasons why your codebuild is failing.

Copy link
Contributor Author

@ragdhall ragdhall Aug 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it needs to be the same or longer in length. make html as well as tox -e twine,sphinx pass locally. The build errors seem unrelated to the PR changes.

Copy link
Collaborator

@bencrabtree bencrabtree Aug 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me, thanks for the clarification


Amazon SageMaker provides implementations of some common machine learning algorithms optimized for GPU architecture and massive datasets.

.. toctree::
:maxdepth: 2

sagemaker.amazon.amazon_estimator
factorization_machines
ipinsights
kmeans
knn
lda
linear_learner
ntm
object2vec
pca
randomcutforest
tabular/index
text/index
time_series/index
unsupervised/index
vision/index
other/index
10 changes: 10 additions & 0 deletions doc/algorithms/other/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
######################
Other
######################

:ref:`All Pre-trained Models <all-pretrained-models>`

.. toctree::
:maxdepth: 2

sagemaker.amazon.amazon_estimator
28 changes: 28 additions & 0 deletions doc/algorithms/tabular/autogluon.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
############
AutoGluon
############

`AutoGluon-Tabular <https://auto.gluon.ai/stable/index.html>`__ is a popular open-source AutoML framework that trains highly accurate machine learning models on an unprocessed tabular dataset.
Unlike existing AutoML frameworks that primarily focus on model and hyperparameter selection, AutoGluon-Tabular succeeds by ensembling multiple models and stacking them in multiple layers.


The following table outlines a variety of sample notebooks that address different use cases of Amazon SageMaker AutoGluon-Tabular algorithm.

.. list-table::
:widths: 25 25
:header-rows: 1

* - Notebook Title
- Description
* - `Tabular classification with Amazon SageMaker AutoGluon-Tabular algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/autogluon_tabular/Amazon_Tabular_Classification_AutoGluon.ipynb>`__
- This notebook demonstrates the use of the Amazon SageMaker AutoGluon-Tabular algorithm to train and host a tabular classification model.
* - `Tabular regression with Amazon SageMaker AutoGluon-Tabular algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/autogluon_tabular/Amazon_Tabular_Regression_AutoGluon.ipynb>`__
- This notebook demonstrates the use of the Amazon SageMaker AutoGluon-Tabular algorithm to train and host a tabular regression model.


For instructions on how to create and access Jupyter notebook instances that you can use to run the example in SageMaker, see
`Use Amazon SageMaker Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html>`__. After you have created a notebook
instance and opened it, choose the SageMaker Examples tab to see a list of all of the SageMaker samples. To open a notebook, choose its
Use tab and choose Create copy.

For detailed documentation, please refer to the `Sagemaker AutoGluon-Tabular Algorithm <https://docs.aws.amazon.com/sagemaker/latest/dg/autogluon-tabular.html>`__.
37 changes: 37 additions & 0 deletions doc/algorithms/tabular/catboost.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
############
CatBoost
############


`CatBoost <https://catboost.ai/>`__ is a popular and high-performance open-source implementation of the Gradient Boosting Decision Tree (GBDT)
algorithm. GBDT is a supervised learning algorithm that attempts to accurately predict a target variable by combining an ensemble of
estimates from a set of simpler and weaker models.

CatBoost introduces two critical algorithmic advances to GBDT:

* The implementation of ordered boosting, a permutation-driven alternative to the classic algorithm

* An innovative algorithm for processing categorical features

Both techniques were created to fight a prediction shift caused by a special kind of target leakage present in all currently existing
implementations of gradient boosting algorithms.

The following table outlines a variety of sample notebooks that address different use cases of Amazon SageMaker CatBoost algorithm.

.. list-table::
:widths: 25 25
:header-rows: 1

* - Notebook Title
- Description
* - `Tabular classification with Amazon SageMaker LightGBM and CatBoost algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/lightgbm_catboost_tabular/Amazon_Tabular_Classification_LightGBM_CatBoost.ipynb>`__
- This notebook demonstrates the use of the Amazon SageMaker CatBoost algorithm to train and host a tabular classification model.
* - `Tabular regression with Amazon SageMaker LightGBM and CatBoost algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/lightgbm_catboost_tabular/Amazon_Tabular_Regression_LightGBM_CatBoost.ipynb>`__
- This notebook demonstrates the use of the Amazon SageMaker CatBoost algorithm to train and host a tabular regression model.

For instructions on how to create and access Jupyter notebook instances that you can use to run the example in SageMaker, see
`Use Amazon SageMaker Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html>`__. After you have created a notebook
instance and opened it, choose the SageMaker Examples tab to see a list of all of the SageMaker samples. To open a notebook, choose its
Use tab and choose Create copy.

For detailed documentation, please refer to the `Sagemaker CatBoost Algorithm <https://docs.aws.amazon.com/sagemaker/latest/dg/catboost.html>`__.
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
FactorizationMachines
Factorization Machines
-------------------------

The Amazon SageMaker Factorization Machines algorithm.
Expand Down
18 changes: 18 additions & 0 deletions doc/algorithms/tabular/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
######################
Tabular
######################

Amazon SageMaker provides built-in algorithms that are tailored to the analysis of tabular data. The built-in SageMaker algorithms for tabular data can be used for either classification or regression problems.

.. toctree::
:maxdepth: 2

autogluon
catboost
factorization_machines
knn
lightgbm
linear_learner
tabtransformer
xgboost
object2vec
File renamed without changes.
28 changes: 28 additions & 0 deletions doc/algorithms/tabular/lightgbm.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
############
LightGBM
############

`LightGBM <https://lightgbm.readthedocs.io/en/latest/>`__ is a popular and efficient open-source implementation of the Gradient Boosting
Decision Tree (GBDT) algorithm. GBDT is a supervised learning algorithm that attempts to accurately predict a target variable by
combining an ensemble of estimates from a set of simpler and weaker models. LightGBM uses additional techniques to significantly improve
the efficiency and scalability of conventional GBDT.

The following table outlines a variety of sample notebooks that address different use cases of Amazon SageMaker LightGBM algorithm.

.. list-table::
:widths: 25 25
:header-rows: 1

* - Notebook Title
- Description
* - `Tabular classification with Amazon SageMaker LightGBM and CatBoost algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/lightgbm_catboost_tabular/Amazon_Tabular_Classification_LightGBM_CatBoost.ipynb>`__
- This notebook demonstrates the use of the Amazon SageMaker LightGBM algorithm to train and host a tabular classification model.
* - `Tabular regression with Amazon SageMaker LightGBM and CatBoost algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/lightgbm_catboost_tabular/Amazon_Tabular_Regression_LightGBM_CatBoost.ipynb>`__
- This notebook demonstrates the use of the Amazon SageMaker LightGBM algorithm to train and host a tabular regression model.

For instructions on how to create and access Jupyter notebook instances that you can use to run the example in SageMaker, see
`Use Amazon SageMaker Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html>`__. After you have created a notebook
instance and opened it, choose the SageMaker Examples tab to see a list of all of the SageMaker samples. To open a notebook, choose its
Use tab and choose Create copy.

For detailed documentation, please refer to the `Sagemaker LightGBM Algorithm <https://docs.aws.amazon.com/sagemaker/latest/dg/lightgbm.html>`__.
File renamed without changes.
28 changes: 28 additions & 0 deletions doc/algorithms/tabular/tabtransformer.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
###############
TabTransformer
###############

`TabTransformer <https://arxiv.org/abs/2012.06678>`__ is a novel deep tabular data modeling architecture for supervised learning. The TabTransformer architecture is built on self-attention-based Transformers.
The Transformer layers transform the embeddings of categorical features into robust contextual embeddings to achieve higher prediction accuracy. Furthermore, the contextual embeddings learned from TabTransformer
are highly robust against both missing and noisy data features, and provide better interpretability.


The following table outlines a variety of sample notebooks that address different use cases of Amazon SageMaker TabTransformer algorithm.

.. list-table::
:widths: 25 25
:header-rows: 1

* - Notebook Title
- Description
* - `Tabular classification with Amazon SageMaker TabTransformer algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/tabtransformer_tabular/Amazon_Tabular_Classification_TabTransformer.ipynb>`__
- This notebook demonstrates the use of the Amazon SageMaker TabTransformer algorithm to train and host a tabular classification model.
* - `Tabular regression with Amazon SageMaker TabTransformer algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/tabtransformer_tabular/Amazon_Tabular_Regression_TabTransformer.ipynb>`__
- This notebook demonstrates the use of the Amazon SageMaker TabTransformer algorithm to train and host a tabular regression model.

For instructions on how to create and access Jupyter notebook instances that you can use to run the example in SageMaker, see
`Use Amazon SageMaker Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html>`__. After you have created a notebook
instance and opened it, choose the SageMaker Examples tab to see a list of all of the SageMaker samples. To open a notebook, choose its
Use tab and choose Create copy.

For detailed documentation, please refer to the `Sagemaker TabTransformer Algorithm <https://docs.aws.amazon.com/sagemaker/latest/dg/tabtransformer.html>`__.
40 changes: 40 additions & 0 deletions doc/algorithms/tabular/xgboost.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
############
XGBoost
############

The `XGBoost <https://github.com/dmlc/xgboost>`__ (eXtreme Gradient Boosting) is a popular and efficient open-source implementation of the gradient boosted trees algorithm. Gradient boosting is a supervised learning algorithm that attempts to accurately predict a target variable
by combining an ensemble of estimates from a set of simpler and weaker models. The XGBoost algorithm performs well in machine learning competitions because of its robust handling of a variety of data types, relationships, distributions, and the variety of hyperparameters that you can
fine-tune. You can use XGBoost for regression, classification (binary and multiclass), and ranking problems.

You can use the new release of the XGBoost algorithm either as a Amazon SageMaker built-in algorithm or as a framework to run training scripts in your local environments. This implementation has a smaller memory footprint, better logging, improved hyperparameter validation, and
an expanded set of metrics than the original versions. It provides an XGBoost estimator that executes a training script in a managed XGBoost environment. The current release of SageMaker XGBoost is based on the original XGBoost versions 1.0, 1.2, 1.3, and 1.5.

The following table outlines a variety of sample notebooks that address different use cases of Amazon SageMaker XGBoost algorithm.

.. list-table::
:widths: 25 25
:header-rows: 1

* - Notebook Title
- Description
* - `How to Create a Custom XGBoost container? <https://sagemaker-examples.readthedocs.io/en/latest/aws_sagemaker_studio/sagemaker_studio_image_build/xgboost_bring_your_own/Batch_Transform_BYO_XGB.html>`__
- This notebook shows you how to build a custom XGBoost Container with Amazon SageMaker Batch Transform.
* - `Regression with XGBoost using Parquet <https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/xgboost_abalone/xgboost_parquet_input_training.html>`__
- This notebook shows you how to use the Abalone dataset in Parquet to train a XGBoost model.
* - `How to Train and Host a Multiclass Classification Model? <https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/xgboost_mnist/xgboost_mnist.html>`__
- This notebook shows how to use the MNIST dataset to train and host a multiclass classification model.
* - `How to train a Model for Customer Churn Prediction? <https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_applying_machine_learning/xgboost_customer_churn/xgboost_customer_churn.html>`__
- This notebook shows you how to train a model to Predict Mobile Customer Departure in an effort to identify unhappy customers.
* - `An Introduction to Amazon SageMaker Managed Spot infrastructure for XGBoost Training <https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/xgboost_abalone/xgboost_managed_spot_training.html>`__
- This notebook shows you how to use Spot Instances for training with a XGBoost Container.
* - `How to use Amazon SageMaker Debugger to debug XGBoost Training Jobs? <https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-debugger/xgboost_builtin_rules/xgboost-regression-debugger-rules.html>`__
- This notebook shows you how to use Amazon SageMaker Debugger to monitor training jobs to detect inconsistencies.
* - `How to use Amazon SageMaker Debugger to debug XGBoost Training Jobs in Real-Time? <https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-debugger/xgboost_realtime_analysis/xgboost-realtime-analysis.html>`__
- This notebook shows you how to use the MNIST dataset and Amazon SageMaker Debugger to perform real-time analysis of XGBoost training jobs while training jobs are running.

For instructions on how to create and access Jupyter notebook instances that you can use to run the example in SageMaker, see
`Use Amazon SageMaker Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html>`__. After you have created a notebook
instance and opened it, choose the SageMaker Examples tab to see a list of all of the SageMaker samples. To open a notebook, choose its
Use tab and choose Create copy.

For detailed documentation, please refer to the `Sagemaker XGBoost Algorithm <https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html>`__.
27 changes: 27 additions & 0 deletions doc/algorithms/text/blazing_text.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
#############
Blazing Text
#############


The Amazon SageMaker BlazingText algorithm provides highly optimized implementations of the Word2vec and text classification algorithms. The Word2vec algorithm is useful for many downstream natural language processing (NLP)
tasks, such as sentiment analysis, named entity recognition, machine translation, etc. Text classification is an important task for applications that perform web searches, information retrieval, ranking, and document classification.

The Word2vec algorithm maps words to high-quality distributed vectors. The resulting vector representation of a word is called a word embedding. Words that are semantically similar correspond to vectors that are close together.
That way, word embeddings capture the semantic relationships between words.

Many natural language processing (NLP) applications learn word embeddings by training on large collections of documents. These pretrained vector representations provide information about semantics and word distributions that
typically improves the generalizability of other models that are later trained on a more limited amount of data. Most implementations of the Word2vec algorithm are not optimized for multi-core CPU architectures. This makes it
difficult to scale to large datasets.

With the BlazingText algorithm, you can scale to large datasets easily. Similar to Word2vec, it provides the Skip-gram and continuous bag-of-words (CBOW) training architectures. BlazingText's implementation of the supervised
multi-class, multi-label text classification algorithm extends the fastText text classifier to use GPU acceleration with custom `CUDA <https://docs.nvidia.com/cuda/index.html>`__

kernels. You can train a model on more than a billion words in a couple of minutes using a multi-core CPU or a GPU. And, you achieve performance on par with the state-of-the-art deep learning text classification algorithms.

The BlazingText algorithm is not parallelizable. For more information on parameters related to training, see `Docker Registry Paths for SageMaker Built-in Algorithms <https://docs.aws.amazon.com/en_us/sagemaker/latest/dg/sagemaker-algo-docker-registry-paths.html>`__.

For a sample notebook that uses the SageMaker BlazingText algorithm to train and deploy supervised binary and multiclass classification models, see
`Blazing Text classification on the DBPedia dataset <https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/blazingtext_text_classification_dbpedia/blazingtext_text_classification_dbpedia.html>`__.
For instructions for creating and accessing Jupyter notebook instances that you can use to run the example in SageMaker, see `Use Amazon SageMaker Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html>`__.
After creating and opening a notebook instance, choose the SageMaker Examples tab to see a list of all the SageMaker examples. The topic modeling example notebooks that use the Blazing Text are located in the Introduction to Amazon
algorithms section. To open a notebook, choose its Use tab, then choose Create copy.
22 changes: 22 additions & 0 deletions doc/algorithms/text/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
######################
Text
######################

Amazon SageMaker provides algorithms that are tailored to the analysis of textual documents used in natural language processing, document classification or summarization, topic modeling or classification, and language transcription or translation.

.. toctree::
:maxdepth: 2

blazing_text
lda
ntm
sequence_to_sequence
text_classification_tensorflow
sentence_pair_classification_tensorflow
sentence_pair_classification_hugging_face
question_answering_pytorch
named_entity_recognition_hugging_face
text_summarization_hugging_face
text_generation_hugging_face
machine_translation_hugging_face
text_embedding_tensorflow_mxnet
File renamed without changes.
10 changes: 10 additions & 0 deletions doc/algorithms/text/machine_translation_hugging_face.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
#####################################
Machine Translation - HuggingFace
#####################################


This is a supervised machine translation algorithm which supports many pre-trained models available in Hugging Face. The following
`sample notebook <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart_machine_translation/Amazon_JumpStart_Machine_Translation.ipynb>`__
demonstrates how to use the Sagemaker Python SDK for Machine Translation for using these algorithms.

For detailed documentation please refer :ref:`Use Built-in Algorithms with Pre-trained Models in SageMaker Python SDK <built-in-algos>`.
10 changes: 10 additions & 0 deletions doc/algorithms/text/named_entity_recognition_hugging_face.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
########################################
Named Entity Recognition - HuggingFace
########################################

This is a supervised named entity recognition algorithm which supports fine-tuning of many pre-trained models available in Hugging Face. The following
`sample notebook <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart_named_entity_recognition/Amazon_JumpStart_Named_Entity_Recognition.ipynb>`__
demonstrates how to use the Sagemaker Python SDK for Named Entity Recognition for using these algorithms.

For detailed documentation please refer `Use Built-in Algorithms with Pre-trained Models in SageMaker Python SDK <https://sagemaker.readthedocs.io/en/stable/overview.html#use-built-in-algorithms-with-pre-trained-models-in-sagemaker-python-sdk>`__

File renamed without changes.
Loading