aws · BasilBeirouti · Aug 17, 2022 · Aug 15, 2022 · Aug 16, 2022 · Aug 16, 2022
@@ -1,20 +1,15 @@
 ######################
-First-Party Algorithms
+Built-in Algorithms
 ######################
 
 Amazon SageMaker provides implementations of some common machine learning algorithms optimized for GPU architecture and massive datasets.
 
 .. toctree::
     :maxdepth: 2
 
-    sagemaker.amazon.amazon_estimator
-    factorization_machines
-    ipinsights
-    kmeans
-    knn
-    lda
-    linear_learner
-    ntm
-    object2vec
-    pca
-    randomcutforest
+    tabular/index
+    text/index
+    time_series/index
+    unsupervised/index
+    vision/index
+    other/index
@@ -0,0 +1,10 @@
+######################
+Other
+######################
+
+:ref:`All Pre-trained Models <all-pretrained-models>`
+
+.. toctree::
+    :maxdepth: 2
+
+    sagemaker.amazon.amazon_estimator
@@ -0,0 +1,28 @@
+############
+AutoGluon
+############
+
+`AutoGluon-Tabular <https://auto.gluon.ai/stable/index.html>`__ is a popular open-source AutoML framework that trains highly accurate machine learning models on an unprocessed tabular dataset.
+Unlike existing AutoML frameworks that primarily focus on model and hyperparameter selection, AutoGluon-Tabular succeeds by ensembling multiple models and stacking them in multiple layers.
+
+
+The following table outlines a variety of sample notebooks that address different use cases of Amazon SageMaker AutoGluon-Tabular algorithm.
+
+.. list-table::
+   :widths: 25 25
+   :header-rows: 1
+
+   * - Notebook Title
+     - Description
+   * - `Tabular classification with Amazon SageMaker AutoGluon-Tabular algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/autogluon_tabular/Amazon_Tabular_Classification_AutoGluon.ipynb>`__
+     - This notebook demonstrates the use of the Amazon SageMaker AutoGluon-Tabular algorithm to train and host a tabular classification model.
+   * - `Tabular regression with Amazon SageMaker AutoGluon-Tabular algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/autogluon_tabular/Amazon_Tabular_Regression_AutoGluon.ipynb>`__
+     - This notebook demonstrates the use of the Amazon SageMaker AutoGluon-Tabular algorithm to train and host a tabular regression model.
+
+
+For instructions on how to create and access Jupyter notebook instances that you can use to run the example in SageMaker, see
+`Use Amazon SageMaker Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html>`__. After you have created a notebook
+instance and opened it, choose the SageMaker Examples tab to see a list of all of the SageMaker samples. To open a notebook, choose its
+Use tab and choose Create copy.
+
+For detailed documentation, please refer to the `Sagemaker AutoGluon-Tabular Algorithm <https://docs.aws.amazon.com/sagemaker/latest/dg/autogluon-tabular.html>`__.
@@ -0,0 +1,37 @@
+############
+CatBoost
+############
+
+
+`CatBoost <https://catboost.ai/>`__ is a popular and high-performance open-source implementation of the Gradient Boosting Decision Tree (GBDT)
+algorithm. GBDT is a supervised learning algorithm that attempts to accurately predict a target variable by combining an ensemble of
+estimates from a set of simpler and weaker models.
+
+CatBoost introduces two critical algorithmic advances to GBDT:
+
+* The implementation of ordered boosting, a permutation-driven alternative to the classic algorithm
+
+* An innovative algorithm for processing categorical features
+
+Both techniques were created to fight a prediction shift caused by a special kind of target leakage present in all currently existing
+implementations of gradient boosting algorithms.
+
+The following table outlines a variety of sample notebooks that address different use cases of Amazon SageMaker CatBoost algorithm.
+
+.. list-table::
+   :widths: 25 25
+   :header-rows: 1
+
+   * - Notebook Title
+     - Description
+   * - `Tabular classification with Amazon SageMaker LightGBM and CatBoost algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/lightgbm_catboost_tabular/Amazon_Tabular_Classification_LightGBM_CatBoost.ipynb>`__
+     - This notebook demonstrates the use of the Amazon SageMaker CatBoost algorithm to train and host a tabular classification model.
+   * - `Tabular regression with Amazon SageMaker LightGBM and CatBoost algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/lightgbm_catboost_tabular/Amazon_Tabular_Regression_LightGBM_CatBoost.ipynb>`__
+     - This notebook demonstrates the use of the Amazon SageMaker CatBoost algorithm to train and host a tabular regression model.
+
+For instructions on how to create and access Jupyter notebook instances that you can use to run the example in SageMaker, see
+`Use Amazon SageMaker Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html>`__. After you have created a notebook
+instance and opened it, choose the SageMaker Examples tab to see a list of all of the SageMaker samples. To open a notebook, choose its
+Use tab and choose Create copy.
+
+For detailed documentation, please refer to the `Sagemaker CatBoost Algorithm <https://docs.aws.amazon.com/sagemaker/latest/dg/catboost.html>`__.
@@ -1,4 +1,4 @@
-FactorizationMachines
+Factorization Machines
 -------------------------
 
 The Amazon SageMaker Factorization Machines algorithm.

@@ -0,0 +1,18 @@
+######################
+Tabular
+######################
+
+Amazon SageMaker provides built-in algorithms that are tailored to the analysis of tabular data. The built-in SageMaker algorithms for tabular data can be used for either classification or regression problems.
+
+.. toctree::
+    :maxdepth: 2
+
+    autogluon
+    catboost
+    factorization_machines
+    knn
+    lightgbm
+    linear_learner
+    tabtransformer
+    xgboost
+    object2vec
@@ -0,0 +1,28 @@
+############
+LightGBM
+############
+
+`LightGBM <https://lightgbm.readthedocs.io/en/latest/>`__ is a popular and efficient open-source implementation of the Gradient Boosting
+Decision Tree (GBDT) algorithm. GBDT is a supervised learning algorithm that attempts to accurately predict a target variable by
+combining an ensemble of estimates from a set of simpler and weaker models. LightGBM uses additional techniques to significantly improve
+the efficiency and scalability of conventional GBDT.
+
+The following table outlines a variety of sample notebooks that address different use cases of Amazon SageMaker LightGBM algorithm.
+
+.. list-table::
+   :widths: 25 25
+   :header-rows: 1
+
+   * - Notebook Title
+     - Description
+   * - `Tabular classification with Amazon SageMaker LightGBM and CatBoost algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/lightgbm_catboost_tabular/Amazon_Tabular_Classification_LightGBM_CatBoost.ipynb>`__
+     - This notebook demonstrates the use of the Amazon SageMaker LightGBM algorithm to train and host a tabular classification model.
+   * - `Tabular regression with Amazon SageMaker LightGBM and CatBoost algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/lightgbm_catboost_tabular/Amazon_Tabular_Regression_LightGBM_CatBoost.ipynb>`__
+     - This notebook demonstrates the use of the Amazon SageMaker LightGBM algorithm to train and host a tabular regression model.
+
+For instructions on how to create and access Jupyter notebook instances that you can use to run the example in SageMaker, see
+`Use Amazon SageMaker Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html>`__. After you have created a notebook
+instance and opened it, choose the SageMaker Examples tab to see a list of all of the SageMaker samples. To open a notebook, choose its
+Use tab and choose Create copy.
+
+For detailed documentation, please refer to the `Sagemaker LightGBM Algorithm <https://docs.aws.amazon.com/sagemaker/latest/dg/lightgbm.html>`__.
@@ -0,0 +1,28 @@
+###############
+TabTransformer
+###############
+
+`TabTransformer <https://arxiv.org/abs/2012.06678>`__ is a novel deep tabular data modeling architecture for supervised learning. The TabTransformer architecture is built on self-attention-based Transformers.
+The Transformer layers transform the embeddings of categorical features into robust contextual embeddings to achieve higher prediction accuracy. Furthermore, the contextual embeddings learned from TabTransformer
+are highly robust against both missing and noisy data features, and provide better interpretability.
+
+
+The following table outlines a variety of sample notebooks that address different use cases of Amazon SageMaker TabTransformer algorithm.
+
+.. list-table::
+   :widths: 25 25
+   :header-rows: 1
+
+   * - Notebook Title
+     - Description
+   * - `Tabular classification with Amazon SageMaker TabTransformer algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/tabtransformer_tabular/Amazon_Tabular_Classification_TabTransformer.ipynb>`__
+     - This notebook demonstrates the use of the Amazon SageMaker TabTransformer algorithm to train and host a tabular classification model.
+   * - `Tabular regression with Amazon SageMaker TabTransformer algorithm <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/tabtransformer_tabular/Amazon_Tabular_Regression_TabTransformer.ipynb>`__
+     - This notebook demonstrates the use of the Amazon SageMaker TabTransformer algorithm to train and host a tabular regression model.
+
+For instructions on how to create and access Jupyter notebook instances that you can use to run the example in SageMaker, see
+`Use Amazon SageMaker Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html>`__. After you have created a notebook
+instance and opened it, choose the SageMaker Examples tab to see a list of all of the SageMaker samples. To open a notebook, choose its
+Use tab and choose Create copy.
+
+For detailed documentation, please refer to the `Sagemaker TabTransformer Algorithm <https://docs.aws.amazon.com/sagemaker/latest/dg/tabtransformer.html>`__.
@@ -0,0 +1,40 @@
+############
+XGBoost
+############
+
+The `XGBoost <https://github.com/dmlc/xgboost>`__ (eXtreme Gradient Boosting) is a popular and efficient open-source implementation of the gradient boosted trees algorithm. Gradient boosting is a supervised learning algorithm that attempts to accurately predict a target variable
+by combining an ensemble of estimates from a set of simpler and weaker models. The XGBoost algorithm performs well in machine learning competitions because of its robust handling of a variety of data types, relationships, distributions, and the variety of hyperparameters that you can
+fine-tune. You can use XGBoost for regression, classification (binary and multiclass), and ranking problems.
+
+You can use the new release of the XGBoost algorithm either as a Amazon SageMaker built-in algorithm or as a framework to run training scripts in your local environments. This implementation has a smaller memory footprint, better logging, improved hyperparameter validation, and
+an expanded set of metrics than the original versions. It provides an XGBoost estimator that executes a training script in a managed XGBoost environment. The current release of SageMaker XGBoost is based on the original XGBoost versions 1.0, 1.2, 1.3, and 1.5.
+
+The following table outlines a variety of sample notebooks that address different use cases of Amazon SageMaker XGBoost algorithm.
+
+.. list-table::
+   :widths: 25 25
+   :header-rows: 1
+
+   * - Notebook Title
+     - Description
+   * - `How to Create a Custom XGBoost container? <https://sagemaker-examples.readthedocs.io/en/latest/aws_sagemaker_studio/sagemaker_studio_image_build/xgboost_bring_your_own/Batch_Transform_BYO_XGB.html>`__
+     - This notebook shows you how to build a custom XGBoost Container with Amazon SageMaker Batch Transform.
+   * - `Regression with XGBoost using Parquet <https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/xgboost_abalone/xgboost_parquet_input_training.html>`__
+     - This notebook shows you how to use the Abalone dataset in Parquet to train a XGBoost model.
+   * - `How to Train and Host a Multiclass Classification Model? <https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/xgboost_mnist/xgboost_mnist.html>`__
+     - This notebook shows how to use the MNIST dataset to train and host a multiclass classification model.
+   * - `How to train a Model for Customer Churn Prediction? <https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_applying_machine_learning/xgboost_customer_churn/xgboost_customer_churn.html>`__
+     - This notebook shows you how to train a model to Predict Mobile Customer Departure in an effort to identify unhappy customers.
+   * - `An Introduction to Amazon SageMaker Managed Spot infrastructure for XGBoost Training <https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/xgboost_abalone/xgboost_managed_spot_training.html>`__
+     - This notebook shows you how to use Spot Instances for training with a XGBoost Container.
+   * - `How to use Amazon SageMaker Debugger to debug XGBoost Training Jobs? <https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-debugger/xgboost_builtin_rules/xgboost-regression-debugger-rules.html>`__
+     - This notebook shows you how to use Amazon SageMaker Debugger to monitor training jobs to detect inconsistencies.
+   * - `How to use Amazon SageMaker Debugger to debug XGBoost Training Jobs in Real-Time? <https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-debugger/xgboost_realtime_analysis/xgboost-realtime-analysis.html>`__
+     - This notebook shows you how to use the MNIST dataset and Amazon SageMaker Debugger to perform real-time analysis of XGBoost training jobs while training jobs are running.
+
+For instructions on how to create and access Jupyter notebook instances that you can use to run the example in SageMaker, see
+`Use Amazon SageMaker Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html>`__. After you have created a notebook
+instance and opened it, choose the SageMaker Examples tab to see a list of all of the SageMaker samples. To open a notebook, choose its
+Use tab and choose Create copy.
+
+For detailed documentation, please refer to the `Sagemaker XGBoost Algorithm <https://docs.aws.amazon.com/sagemaker/latest/dg/xgboost.html>`__.
@@ -0,0 +1,27 @@
+#############
+Blazing Text
+#############
+
+
+The Amazon SageMaker BlazingText algorithm provides highly optimized implementations of the Word2vec and text classification algorithms. The Word2vec algorithm is useful for many downstream natural language processing (NLP)
+tasks, such as sentiment analysis, named entity recognition, machine translation, etc. Text classification is an important task for applications that perform web searches, information retrieval, ranking, and document classification.
+
+The Word2vec algorithm maps words to high-quality distributed vectors. The resulting vector representation of a word is called a word embedding. Words that are semantically similar correspond to vectors that are close together.
+That way, word embeddings capture the semantic relationships between words.
+
+Many natural language processing (NLP) applications learn word embeddings by training on large collections of documents. These pretrained vector representations provide information about semantics and word distributions that
+typically improves the generalizability of other models that are later trained on a more limited amount of data. Most implementations of the Word2vec algorithm are not optimized for multi-core CPU architectures. This makes it
+difficult to scale to large datasets.
+
+With the BlazingText algorithm, you can scale to large datasets easily. Similar to Word2vec, it provides the Skip-gram and continuous bag-of-words (CBOW) training architectures. BlazingText's implementation of the supervised
+multi-class, multi-label text classification algorithm extends the fastText text classifier to use GPU acceleration with custom `CUDA <https://docs.nvidia.com/cuda/index.html>`__
+
+kernels. You can train a model on more than a billion words in a couple of minutes using a multi-core CPU or a GPU. And, you achieve performance on par with the state-of-the-art deep learning text classification algorithms.
+
+The BlazingText algorithm is not parallelizable. For more information on parameters related to training, see `Docker Registry Paths for SageMaker Built-in Algorithms <https://docs.aws.amazon.com/en_us/sagemaker/latest/dg/sagemaker-algo-docker-registry-paths.html>`__.
+
+For a sample notebook that uses the SageMaker BlazingText algorithm to train and deploy supervised binary and multiclass classification models, see
+`Blazing Text classification on the DBPedia dataset <https://sagemaker-examples.readthedocs.io/en/latest/introduction_to_amazon_algorithms/blazingtext_text_classification_dbpedia/blazingtext_text_classification_dbpedia.html>`__.
+For instructions for creating and accessing Jupyter notebook instances that you can use to run the example in SageMaker, see `Use Amazon SageMaker Notebook Instances <https://docs.aws.amazon.com/sagemaker/latest/dg/nbi.html>`__.
+After creating and opening a notebook instance, choose the SageMaker Examples tab to see a list of all the SageMaker examples. The topic modeling example notebooks that use the Blazing Text are located in the Introduction to Amazon
+algorithms section. To open a notebook, choose its Use tab, then choose Create copy.
@@ -0,0 +1,22 @@
+######################
+Text
+######################
+
+Amazon SageMaker provides algorithms that are tailored to the analysis of textual documents used in natural language processing, document classification or summarization, topic modeling or classification, and language transcription or translation.
+
+.. toctree::
+    :maxdepth: 2
+
+    blazing_text
+    lda
+    ntm
+    sequence_to_sequence
+    text_classification_tensorflow
+    sentence_pair_classification_tensorflow
+    sentence_pair_classification_hugging_face
+    question_answering_pytorch
+    named_entity_recognition_hugging_face
+    text_summarization_hugging_face
+    text_generation_hugging_face
+    machine_translation_hugging_face
+    text_embedding_tensorflow_mxnet
@@ -0,0 +1,10 @@
+#####################################
+Machine Translation - HuggingFace
+#####################################
+
+
+This is a supervised machine translation algorithm which supports many pre-trained models available in Hugging Face. The following
+`sample notebook <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart_machine_translation/Amazon_JumpStart_Machine_Translation.ipynb>`__
+demonstrates how to use the Sagemaker Python SDK for Machine Translation for using these algorithms.
+
+For detailed documentation please refer :ref:`Use Built-in Algorithms with Pre-trained Models in SageMaker Python SDK <built-in-algos>`.
@@ -0,0 +1,10 @@
+########################################
+Named Entity Recognition - HuggingFace
+########################################
+
+This is a supervised named entity recognition algorithm which supports fine-tuning of many pre-trained models available in Hugging Face. The following
+`sample notebook <https://github.com/aws/amazon-sagemaker-examples/blob/main/introduction_to_amazon_algorithms/jumpstart_named_entity_recognition/Amazon_JumpStart_Named_Entity_Recognition.ipynb>`__
+demonstrates how to use the Sagemaker Python SDK for Named Entity Recognition for using these algorithms.
+
+For detailed documentation please refer `Use Built-in Algorithms with Pre-trained Models in SageMaker Python SDK <https://sagemaker.readthedocs.io/en/stable/overview.html#use-built-in-algorithms-with-pre-trained-models-in-sagemaker-python-sdk>`__
+