diff --git a/doc/doc_utils/jumpstart_doc_utils.py b/doc/doc_utils/jumpstart_doc_utils.py index d2658dca30..94096fbf1d 100644 --- a/doc/doc_utils/jumpstart_doc_utils.py +++ b/doc/doc_utils/jumpstart_doc_utils.py @@ -143,20 +143,26 @@ def create_jumpstart_model_table(): file_content.append(".. |external-link| raw:: html\n\n") file_content.append(' \n\n') - file_content.append("==================================\n") - file_content.append("JumpStart Available Model Table\n") - file_content.append("==================================\n") + file_content.append("================================================\n") + file_content.append("Built-in Algorithms with pre-trained Model Table\n") + file_content.append("================================================\n") file_content.append( """ - JumpStart for the SageMaker Python SDK uses model IDs and model versions to access the necessary - utilities. This table serves to provide the core material plus some extra information that can be useful - in selecting the correct model ID and corresponding parameters.\n""" + The SageMaker Python SDK uses model IDs and model versions to access the necessary + utilities for pre-trained models. This table serves to provide the core material plus + some extra information that can be useful in selecting the correct model ID and + corresponding parameters.\n""" ) file_content.append( """ If you want to automatically use the latest version of the model, use "*" for the `model_version` attribute. We highly suggest pinning an exact model version however.\n""" ) + file_content.append( + """ + These models are also available through the + `JumpStart UI in SageMaker Studio `__\n""" + ) file_content.append("\n") file_content.append(".. list-table:: Available Models\n") file_content.append(" :widths: 50 20 20 20 30 20\n") @@ -183,5 +189,6 @@ def create_jumpstart_model_table(): " - `{} <{}>`__ |external-link|\n".format(model_source, model_spec["url"]) ) - f = open("doc_utils/jumpstart.rst", "w") + f = open("doc_utils/pretrainedmodels.rst", "w") f.writelines(file_content) + f.close() diff --git a/doc/doc_utils/jumpstart.rst b/doc/doc_utils/pretrainedmodels.rst similarity index 100% rename from doc/doc_utils/jumpstart.rst rename to doc/doc_utils/pretrainedmodels.rst diff --git a/doc/overview.rst b/doc/overview.rst index 52c942b47b..14b7d47cda 100644 --- a/doc/overview.rst +++ b/doc/overview.rst @@ -573,24 +573,31 @@ Here is an example: # When you are done using your endpoint model.sagemaker_session.delete_endpoint('my-endpoint') -********************************************************* -Use SageMaker JumpStart Algorithms with Pretrained Models -********************************************************* +*********************************************************************** +Use Built-in Algorithms with Pre-trained Models in SageMaker Python SDK +*********************************************************************** + +SageMaker Python SDK provides built-in algorithms with pre-trained models from popular open source model +hubs, such as TensorFlow Hub, Pytorch Hub, and HuggingFace. Customer can deploy these pre-trained models +as-is or first fine-tune them on a custom dataset and then deploy to a SageMaker endpoint for inference. + + +SageMaker SDK built-in algorithms allow customers access pre-trained models using model ids and model +versions. The ‘pre-trained model’ table below provides list of models with information useful in +selecting the correct model id and corresponding parameters. These models are also available through +the `JumpStart UI in SageMaker Studio `__. -JumpStart for the SageMaker Python SDK uses model ids and model versions to access the necessary -utilities. This table serves to provide the core material plus some extra information that can be useful -in selecting the correct model id and corresponding parameters. .. toctree:: :maxdepth: 2 - doc_utils/jumpstart + doc_utils/pretrainedmodels Example notebooks ================= -JumpStart supports 15 different machine learning problem types. Below is a list of all the supported -problem types with a link to a Jupyter notebook that provides example usage. +SageMaker built-in algorithms with pre-trained models support 15 different machine learning problem types. +Below is a list of all the supported problem types with a link to a Jupyter notebook that provides example usage. Vision - `Image Classification `__ @@ -610,25 +617,15 @@ Text - `Text Embedding `__ Tabular - - `Tabular Classification (LightGBM & Catboost) `__ - - `Tabular Classification (XGBoost & Linear Learner) `__ - - `Tabular Regression (LightGBM & Catboost) `__ - - `Tabular Regression (XGBoost & Linear Learner) `__ - - -`Amazon SageMaker JumpStart `__ is a -SageMaker feature that helps users bring machine learning (ML) -applications to market using prebuilt solutions for common use cases, -example notebooks, open source models from model zoos, and built-in -algorithms. - -A JumpStart model enables you to quickly start a machine learning -workflow. JumpStart takes models from popular open source model hubs, -such as TensorFlow and HuggingFace, and pre-trains them on an open -source dataset. Using the SageMaker Python SDK, you can select a -prebuilt model from the model zoo to train on custom data or deploy -to a SageMaker endpoint for inference without signing up for -SageMaker Studio. + - `Tabular Classification (LightGBM & Catboost) `__ + - `Tabular Classification (XGBoost & Scikit-learn Linear Learner) `__ + - `Tabular Classification (AutoGluon) `__ + - `Tabular Classification (TabTransformer) `__ + - `Tabular Regression (LightGBM & Catboost) `__ + - `Tabular Regression (XGBoost & Scikit-learn Linear Learner) `__ + - `Tabular Regression (AutoGluon) `__ + - `Tabular Regression (TabTransformer) `__ + The following topic give you information about JumpStart components, as well as how to use the SageMaker Python SDK for these workflows. @@ -644,24 +641,22 @@ Prerequisites Amazon S3. For more information about IAM role permissions, see `Policies and permissions in IAM `__. -JumpStart Components -==================== +Built-in Components +=================== -The following sections give information about the main JumpStart +The following sections give information about the main built-in components and their function. -JumpStart models ----------------- +Pre-trained models +------------------ -JumpStart maintains a model zoo of over 300 models pre-trained on -open source datasets. You can use the SageMaker Python SDK -to fine-tune a model on your own dataset or deploy it directly to a -SageMaker endpoint for inference. +SageMaker maintains a model zoo of over 300 models from popular open source model hubs, such as +TensorFlow Hub, Pytorch Hub, and HuggingFace. You can use the SageMaker Python SDK to fine-tune +a model on your own dataset or deploy it directly to a SageMaker endpoint for inference. -JumpStart model artifacts are stored as tarballs in the JumpStart S3 -bucket. Each model is versioned and contains a unique ID which can be -used to retrieve the model URI. The following information describes -the ``model_id`` and ``model_version`` needed to retrieve the URI. +Model artifacts are stored as tarballs in a S3 bucket. Each model is versioned and contains a +unique ID which can be used to retrieve the model URI. The following information describes the +``model_id`` and ``model_version`` needed to retrieve the URI. .. container:: @@ -671,7 +666,7 @@ the ``model_id`` and ``model_version`` needed to retrieve the URI. required parameter. To retrieve a model, first select a ``model ID`` and ``version`` from -the :doc:`available models <./doc_utils/jumpstart>`. +the :doc:`available models <./doc_utils/pretrainedmodels>`. .. code:: python @@ -688,15 +683,13 @@ Then use those values to retrieve the model as follows.     model_id=model_id, model_version=model_version, model_scope=scope ) -JumpStart scripts ------------------ +Model scripts +------------- -To adapt JumpStart models for SageMaker, a custom -script is needed to perform training or inference. JumpStart -maintains a suite of scripts used for each of the models in the -JumpStart S3 bucket, which can be accessed using the SageMaker Python -SDK. Use the ``model_id`` and ``version`` of the corresponding model -to retrieve the related script as follows. +To adapt pre-trained models for SageMaker, a custom script is needed to perform training +or inference. SageMaker maintains a suite of scripts used for each of the models in the +S3 bucket, which can be accessed using the SageMaker Python SDK Use the ``model_id`` and +``version`` of the corresponding model to retrieve the related script as follows. .. code:: python @@ -706,11 +699,11 @@ to retrieve the related script as follows.     model_id=model_id, model_version=model_version, script_scope=scope ) -JumpStart images ----------------- +Model images +------------- A Docker image is required to perform training or inference on all -SageMaker models. JumpStart relies on Docker images from the +SageMaker models. SageMaker relies on Docker images from the following repos https://github.com/aws/deep-learning-containers, https://github.com/aws/sagemaker-xgboost-container, and https://github.com/aws/sagemaker-scikit-learn-container. Use @@ -733,16 +726,16 @@ retrieve the related image as follows. Deploy a  Pre-Trained Model Directly to a SageMaker Endpoint ============================================================ -In this section, you learn how to take a pre-trained JumpStart model -and deploy it directly to a SageMaker Endpoint. This is the fastest -way to start machine learning with a JumpStart model. The following +In this section, you learn how to take a pre-trained model and deploy +it directly to a SageMaker Endpoint. This is the fastest way to start +machine learning with a pre-trained model. The following assumes familiarity with `SageMaker models `__ and their deploy functions. -To begin, select a ``model_id`` and ``version`` from the JumpStart +To begin, select a ``model_id`` and ``version`` from the pre-trained models table, as well as a model scope of either “inference” or -“training”. For this example, you use a pre-trained JumpStart model, +“training”. For this example, you use a pre-trained model, so select “inference”  for your model scope. Use the utility functions to retrieve the URI of each of the three components you need to continue. @@ -772,7 +765,7 @@ need to continue. Next, pass the URIs and other key parameters as part of a new SageMaker Model class. The ``entry_point`` is a JumpStart script -named ``inference.py``. JumpStart handles the implementation of this +named ``inference.py``. SageMaker handles the implementation of this script. You must use this value for model inference to be successful. For more information about the Model class and its parameters, see `Model `__. @@ -811,7 +804,7 @@ Deployment may take about 5 minutes. Because the model and script URIs are distributed by SageMaker JumpStart, the endpoint, endpoint config and model resources will be prefixed with ``sagemaker-jumpstart``. Refer to the model ``Tags`` to inspect the -JumpStart artifacts involved in the model creation. +model artifacts involved in the model creation. Perform Inference ----------------- @@ -829,17 +822,16 @@ the Fine-tune a Model and Deploy to a SageMaker Endpoint ==================================================== -In this section, you initiate a training job to further train one of -the pretrained JumpStart models for your use case, then deploy it to -a SageMaker Endpoint for inference. This lets you fine tune the model -for your use case with your custom dataset. The following assumes +In this section, you initiate a training job to further train one of the pre-trained models +for your use case, then deploy it to a SageMaker Endpoint for inference. This lets you fine +tune the model for your use case with your custom dataset. The following assumes familiarity with `SageMaker training jobs and their architecture `__. -Fine-tune a JumpStart Model on a Custom Dataset ------------------------------------------------ +Fine-tune a Pre-trained Model on a Custom Dataset +------------------------------------------------- -To begin, select a ``model_id`` and ``version`` from the JumpStart +To begin, select a ``model_id`` and ``version`` from the pre-trained models table, as well as a model scope. In this case, you begin by using “training” as the model scope. Use the utility functions to retrieve the URI of each of the three components you need to @@ -875,10 +867,10 @@ Table `__ and selec     instance_type=training_instance_type, ) -Next, use the JumpStart resource URIs to create an ``Estimator`` and +Next, use the model resource URIs to create an ``Estimator`` and train it on a custom training dataset. You must specify the S3 path of your custom training dataset. The Estimator class requires -an ``entry_point`` parameter. In this case, JumpStart uses +an ``entry_point`` parameter. In this case, SageMaker uses “transfer_learning.py”. The training job fails to execute if this value is not set.