diff --git a/doc/overview.rst b/doc/overview.rst index 6a7f61ac58..c9d7ce3c63 100644 --- a/doc/overview.rst +++ b/doc/overview.rst @@ -1,5 +1,6 @@ +############################## Using the SageMaker Python SDK -============================== +############################## SageMaker Python SDK provides several high-level abstractions for working with Amazon SageMaker. These are: @@ -8,13 +9,82 @@ SageMaker Python SDK provides several high-level abstractions for working with A - **Predictors**: Provide real-time inference and transformation using Python data-types against a SageMaker endpoint. - **Session**: Provides a collection of methods for working with SageMaker resources. -``Estimator`` and ``Model`` implementations for MXNet, TensorFlow, Chainer, PyTorch, and Amazon ML algorithms are included. +``Estimator`` and ``Model`` implementations for MXNet, TensorFlow, Chainer, PyTorch, scikit-learn, Amazon SageMaker built-in algorithms, Reinforcement Learning, are included. There's also an ``Estimator`` that runs SageMaker compatible custom Docker containers, enabling you to run your own ML algorithms by using the SageMaker Python SDK. .. contents:: + :depth: 2 + +******************************************* +Train a Model with the SageMaker Python SDK +******************************************* + +To train a model by using the SageMaker Python SDK, you: + +1. Prepare a training script +2. Create an estimator +3. Call the ``fit`` method of the estimator + +After you train a model, you can save it, and then serve the model as an endpoint to get real-time inferences or get inferences for an entire dataset by using batch transform. + +Prepare a Training script +========================= + +Your training script must be a Python 2.7 or 3.6 compatible source file. + +The training script is very similar to a training script you might run outside of SageMaker, but you can access useful properties about the training environment through various environment variables, including the following: + +* ``SM_MODEL_DIR``: A string that represents the path where the training job writes the model artifacts to. + After training, artifacts in this directory are uploaded to S3 for model hosting. +* ``SM_NUM_GPUS``: An integer representing the number of GPUs available to the host. +* ``SM_CHANNEL_XXXX``: A string that represents the path to the directory that contains the input data for the specified channel. + For example, if you specify two input channels in the MXNet estimator's ``fit`` call, named 'train' and 'test', the environment variables ``SM_CHANNEL_TRAIN`` and ``SM_CHANNEL_TEST`` are set. +* ``SM_HPS``: A json dump of the hyperparameters preserving json types (boolean, integer, etc.) + +For the exhaustive list of available environment variables, see the `SageMaker Containers documentation `__. + +A typical training script loads data from the input channels, configures training with hyperparameters, trains a model, and saves a model to ``model_dir`` so that it can be deployed for inference later. +Hyperparameters are passed to your script as arguments and can be retrieved with an ``argparse.ArgumentParser`` instance. +For example, a training script might start with the following: + +.. code:: python + + import argparse + import os + import json + + if __name__ =='__main__': + + parser = argparse.ArgumentParser() + + # hyperparameters sent by the client are passed as command-line arguments to the script. + parser.add_argument('--epochs', type=int, default=10) + parser.add_argument('--batch-size', type=int, default=100) + parser.add_argument('--learning-rate', type=float, default=0.1) + + # an alternative way to load hyperparameters via SM_HPS environment variable. + parser.add_argument('--sm-hps', type=json.loads, default=os.environ['SM_HPS']) + + # input data and model directories + parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR']) + parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN']) + parser.add_argument('--test', type=str, default=os.environ['SM_CHANNEL_TEST']) + + args, _ = parser.parse_known_args() + + # ... load from args.train and args.test, train a model, write model to args.model_dir. + +Because the SageMaker imports your training script, you should put your training code in a main guard (``if __name__=='__main__':``) if you are using the same script to host your model, +so that SageMaker does not inadvertently run your training code at the wrong point in execution. + +Note that SageMaker doesn't support argparse actions. +If you want to use, for example, boolean hyperparameters, you need to specify ``type`` as ``bool`` in your script and provide an explicit ``True`` or ``False`` value for this hyperparameter when you create your estimator. + +For more on training environment variables, please visit `SageMaker Containers `_. + Using Estimators ----------------- +================ Here is an end to end example of how to use a SageMaker Estimator: @@ -84,8 +154,37 @@ For more `information `__ for more details about built-in metrics of each Amazon SageMaker algorithm. -Local Mode -~~~~~~~~~~ - -The SageMaker Python SDK supports local mode, which allows you to create estimators and deploy them to your local environment. -This is a great way to test your deep learning scripts before running them in SageMaker's managed training or hosting environments. -Local Mode is supported for frameworks images (TensorFlow, MXNet, Chainer, PyTorch, and Scikit-Learn) and images you supply yourself. - -We can take the example in `Using Estimators <#using-estimators>`__ , and use either ``local`` or ``local_gpu`` as the instance type. - -.. code:: python - - from sagemaker.mxnet import MXNet - - # Configure an MXNet Estimator (no training happens yet) - mxnet_estimator = MXNet('train.py', - role='SageMakerRole', - train_instance_type='local', - train_instance_count=1, - framework_version='1.2.1') - - # In Local Mode, fit will pull the MXNet container Docker image and run it locally - mxnet_estimator.fit('s3://my_bucket/my_training_data/') - - # Alternatively, you can train using data in your local file system. This is only supported in Local mode. - mxnet_estimator.fit('file:///tmp/my_training_data') - - # Deploys the model that was generated by fit() to local endpoint in a container - mxnet_predictor = mxnet_estimator.deploy(initial_instance_count=1, instance_type='local') - - # Serializes data and makes a prediction request to the local endpoint - response = mxnet_predictor.predict(data) - - # Tears down the endpoint container and deletes the corresponding endpoint configuration - mxnet_predictor.delete_endpoint() - - # Deletes the model - mxnet_predictor.delete_model() - - -If you have an existing model and want to deploy it locally, don't specify a sagemaker_session argument to the ``MXNetModel`` constructor. -The correct session is generated when you call ``model.deploy()``. - -Here is an end-to-end example: - -.. code:: python - - import numpy - from sagemaker.mxnet import MXNetModel - - model_location = 's3://mybucket/my_model.tar.gz' - code_location = 's3://mybucket/sourcedir.tar.gz' - s3_model = MXNetModel(model_data=model_location, role='SageMakerRole', - entry_point='mnist.py', source_dir=code_location) - - predictor = s3_model.deploy(initial_instance_count=1, instance_type='local') - data = numpy.zeros(shape=(1, 1, 28, 28)) - predictor.predict(data) - - # Tear down the endpoint container and delete the corresponding endpoint configuration - predictor.delete_endpoint() - - # Deletes the model - predictor.delete_model() - - -If you don't want to deploy your model locally, you can also choose to perform a Local Batch Transform Job. This is -useful if you want to test your container before creating a Sagemaker Batch Transform Job. Note that the performance -will not match Batch Transform Jobs hosted on SageMaker but it is still a useful tool to ensure you have everything -right or if you are not dealing with huge amounts of data. - -Here is an end-to-end example: - -.. code:: python - - from sagemaker.mxnet import MXNet - - mxnet_estimator = MXNet('train.py', - role='SageMakerRole', - train_instance_type='local', - train_instance_count=1, - framework_version='1.2.1') +BYO Docker Containers with SageMaker Estimators +----------------------------------------------- - mxnet_estimator.fit('file:///tmp/my_training_data') - transformer = mxnet_estimator.transformer(1, 'local', assemble_with='Line', max_payload=1) - transformer.transform('s3://my/transform/data, content_type='text/csv', split_type='Line') - transformer.wait() +To use a Docker image that you created and use the SageMaker SDK for training, the easiest way is to use the dedicated ``Estimator`` class. +You can create an instance of the ``Estimator`` class with desired Docker image and use it as described in previous sections. - # Deletes the SageMaker model - transformer.delete_model() +Please refer to the full example in the examples repo: +:: -For detailed examples of running Docker in local mode, see: + git clone https://github.com/awslabs/amazon-sagemaker-examples.git -- `TensorFlow local mode example notebook `__. -- `MXNet local mode example notebook `__. -A few important notes: +The example notebook is located here: +``advanced_functionality/scikit_bring_your_own/scikit_bring_your_own.ipynb`` -- Only one local mode endpoint can be running at a time. -- If you are using S3 data as input, it is pulled from S3 to your local environment. Ensure you have sufficient space to store the data locally. -- If you run into problems it often due to different Docker containers conflicting. Killing these containers and re-running often solves your problems. -- Local Mode requires Docker Compose and `nvidia-docker2 `__ for ``local_gpu``. -- Distributed training is not yet supported for ``local_gpu``. +You can also find this notebook in the **Advanced Functionality** folder of the **SageMaker Examples** section in a notebook instance. +For information about using sample notebooks in a SageMaker notebook instance, see `Use Example Notebooks `__ +in the AWS documentation. Incremental Training -~~~~~~~~~~~~~~~~~~~~ +==================== Incremental training allows you to bring a pre-trained model into a SageMaker training job and use it as a starting point for a new model. There are several situations where you might want to do this: @@ -327,39 +342,49 @@ Currently, the following algorithms support incremental training: - Object Detection - Semantic Segmentation +************************************************ +Using Models Trained Outside of Amazon SageMaker +************************************************ -Using SageMaker AlgorithmEstimators ------------------------------------ +You can use models that you train outside of Amazon SageMaker, and model packages that you create or subscribe to in the AWS Marketplace to get inferences. -With the SageMaker Algorithm entities, you can create training jobs with just an ``algorithm_arn`` instead of -a training image. There is a dedicated ``AlgorithmEstimator`` class that accepts ``algorithm_arn`` as a -parameter, the rest of the arguments are similar to the other Estimator classes. This class also allows you to -consume algorithms that you have subscribed to in the AWS Marketplace. The AlgorithmEstimator performs -client-side validation on your inputs based on the algorithm's properties. +BYO Model +========= -Here is an example: +You can create an endpoint from an existing model that you trained outside of Amazon Sagemaker. +That is, you can bring your own model: + +First, package the files for the trained model into a ``.tar.gz`` file, and upload the archive to S3. + +Next, create a ``Model`` object that corresponds to the framework that you are using: `MXNetModel `__ or `TensorFlowModel `__. + +Example code using ``MXNetModel``: .. code:: python - import sagemaker + from sagemaker.mxnet.model import MXNetModel - algo = sagemaker.AlgorithmEstimator( - algorithm_arn='arn:aws:sagemaker:us-west-2:1234567:algorithm/some-algorithm', - role='SageMakerRole', - train_instance_count=1, - train_instance_type='ml.c4.xlarge') + sagemaker_model = MXNetModel(model_data='s3://path/to/model.tar.gz', + role='arn:aws:iam::accid:sagemaker-role', + entry_point='entry_point.py') - train_input = algo.sagemaker_session.upload_data(path='/path/to/your/data') +After that, invoke the ``deploy()`` method on the ``Model``: - algo.fit({'training': train_input}) - algo.deploy(1, 'ml.m4.xlarge') +.. code:: python - # When you are done using your endpoint - algo.delete_endpoint() + predictor = sagemaker_model.deploy(initial_instance_count=1, + instance_type='ml.m4.xlarge') + +This returns a predictor the same way an ``Estimator`` does when ``deploy()`` is called. You can now get inferences just like with any other model deployed on Amazon SageMaker. +A full example is available in the `Amazon SageMaker examples repository `__. + +You can also find this notebook in the **Advanced Functionality** section of the **SageMaker Examples** section in a notebook instance. +For information about using sample notebooks in a SageMaker notebook instance, see `Use Example Notebooks `__ +in the AWS documentation. Consuming SageMaker Model Packages ----------------------------------- +================================== SageMaker Model Packages are a way to specify and share information for how to create SageMaker Models. With a SageMaker Model Package that you have created or subscribed to in the AWS Marketplace, @@ -381,26 +406,9 @@ Here is an example: # When you are done using your endpoint model.sagemaker_session.delete_endpoint('my-endpoint') - -BYO Docker Containers with SageMaker Estimators ------------------------------------------------ - -To use a Docker image that you created and use the SageMaker SDK for training, the easiest way is to use the dedicated ``Estimator`` class. -You can create an instance of the ``Estimator`` class with desired Docker image and use it as described in previous sections. - -Please refer to the full example in the examples repo: - -:: - - git clone https://github.com/awslabs/amazon-sagemaker-examples.git - - -The example notebook is located here: -``advanced_functionality/scikit_bring_your_own/scikit_bring_your_own.ipynb`` - - +******************************** SageMaker Automatic Model Tuning --------------------------------- +******************************** All of the estimators can be used with SageMaker Automatic Model Tuning, which performs hyperparameter tuning jobs. A hyperparameter tuning job finds the best version of a model by running many training jobs on your dataset using the algorithm with different values of hyperparameters within ranges @@ -494,17 +502,21 @@ For more detailed examples of running hyperparameter tuning jobs, see: - `Bringing your own estimator for hyperparameter tuning `__ - `Analyzing results `__ +You can also find these notebooks in the **Hyperprameter Tuning** section of the **SageMaker Examples** section in a notebook instance. +For information about using sample notebooks in a SageMaker notebook instance, see `Use Example Notebooks `__ +in the AWS documentation. + For more detailed explanations of the classes that this library provides for automatic model tuning, see: - `API docs for HyperparameterTuner and parameter range classes `__ - `API docs for analytics classes `__ - +************************* SageMaker Batch Transform -------------------------- +************************* After you train a model, you can use Amazon SageMaker Batch Transform to perform inferences with the model. -Batch Transform manages all necessary compute resources, including launching instances to deploy endpoints and deleting them afterward. +Batch transform manages all necessary compute resources, including launching instances to deploy endpoints and deleting them afterward. You can read more about SageMaker Batch Transform in the `AWS documentation `__. If you trained the model using a SageMaker Python SDK estimator, @@ -533,9 +545,118 @@ You can also specify other attributes of your data, such as the content type. For more details about what can be specified here, see `API docs `__. +********** +Local Mode +********** + +The SageMaker Python SDK supports local mode, which allows you to create estimators and deploy them to your local environment. +This is a great way to test your deep learning scripts before running them in SageMaker's managed training or hosting environments. +Local Mode is supported for frameworks images (TensorFlow, MXNet, Chainer, PyTorch, and Scikit-Learn) and images you supply yourself. + +We can take the example in `Using Estimators <#using-estimators>`__ , and use either ``local`` or ``local_gpu`` as the instance type. + +.. code:: python + + from sagemaker.mxnet import MXNet + + # Configure an MXNet Estimator (no training happens yet) + mxnet_estimator = MXNet('train.py', + role='SageMakerRole', + train_instance_type='local', + train_instance_count=1, + framework_version='1.2.1') + + # In Local Mode, fit will pull the MXNet container Docker image and run it locally + mxnet_estimator.fit('s3://my_bucket/my_training_data/') + + # Alternatively, you can train using data in your local file system. This is only supported in Local mode. + mxnet_estimator.fit('file:///tmp/my_training_data') + + # Deploys the model that was generated by fit() to local endpoint in a container + mxnet_predictor = mxnet_estimator.deploy(initial_instance_count=1, instance_type='local') + + # Serializes data and makes a prediction request to the local endpoint + response = mxnet_predictor.predict(data) + + # Tears down the endpoint container and deletes the corresponding endpoint configuration + mxnet_predictor.delete_endpoint() + + # Deletes the model + mxnet_predictor.delete_model() + + +If you have an existing model and want to deploy it locally, don't specify a sagemaker_session argument to the ``MXNetModel`` constructor. +The correct session is generated when you call ``model.deploy()``. + +Here is an end-to-end example: + +.. code:: python + + import numpy + from sagemaker.mxnet import MXNetModel + + model_location = 's3://mybucket/my_model.tar.gz' + code_location = 's3://mybucket/sourcedir.tar.gz' + s3_model = MXNetModel(model_data=model_location, role='SageMakerRole', + entry_point='mnist.py', source_dir=code_location) + + predictor = s3_model.deploy(initial_instance_count=1, instance_type='local') + data = numpy.zeros(shape=(1, 1, 28, 28)) + predictor.predict(data) + + # Tear down the endpoint container and delete the corresponding endpoint configuration + predictor.delete_endpoint() + + # Deletes the model + predictor.delete_model() + + +If you don't want to deploy your model locally, you can also choose to perform a Local Batch Transform Job. This is +useful if you want to test your container before creating a Sagemaker Batch Transform Job. Note that the performance +will not match Batch Transform Jobs hosted on SageMaker but it is still a useful tool to ensure you have everything +right or if you are not dealing with huge amounts of data. + +Here is an end-to-end example: + +.. code:: python + + from sagemaker.mxnet import MXNet + + mxnet_estimator = MXNet('train.py', + role='SageMakerRole', + train_instance_type='local', + train_instance_count=1, + framework_version='1.2.1') + + mxnet_estimator.fit('file:///tmp/my_training_data') + transformer = mxnet_estimator.transformer(1, 'local', assemble_with='Line', max_payload=1) + transformer.transform('s3://my/transform/data, content_type='text/csv', split_type='Line') + transformer.wait() + + # Deletes the SageMaker model + transformer.delete_model() + + +For detailed examples of running Docker in local mode, see: + +- `TensorFlow local mode example notebook `__. +- `MXNet local mode example notebook `__. + +You can also find these notebooks in the **SageMaker Python SDK** section of the **SageMaker Examples** section in a notebook instance. +For information about using sample notebooks in a SageMaker notebook instance, see `Use Example Notebooks `__ +in the AWS documentation. +A few important notes: + +- Only one local mode endpoint can be running at a time. +- If you are using S3 data as input, it is pulled from S3 to your local environment. Ensure you have sufficient space to store the data locally. +- If you run into problems it often due to different Docker containers conflicting. Killing these containers and re-running often solves your problems. +- Local Mode requires Docker Compose and `nvidia-docker2 `__ for ``local_gpu``. +- Distributed training is not yet supported for ``local_gpu``. + +************************************** Secure Training and Inference with VPC --------------------------------------- +************************************** Amazon SageMaker allows you to control network traffic to and from model container instances using Amazon Virtual Private Cloud (VPC). You can configure SageMaker to use your own private VPC in order to further protect and monitor traffic. @@ -619,8 +740,10 @@ Likewise, when you create ``Transformer`` from the ``Estimator`` using ``transfo # Transform Job container instances will run in your VPC mxnet_vpc_transformer.transform('s3://my-bucket/batch-transform-input') +*********************************************************** Secure Training with Network Isolation (Internet-Free) Mode -------------------------------------------------------------------------- +*********************************************************** + You can enable network isolation mode when running training and inference on Amazon SageMaker. For more information about Amazon SageMaker network isolation mode, see the `SageMaker documentation on network isolation or internet-free mode `__. @@ -646,57 +769,10 @@ A new training job channel, named ``code``, will be added with that S3 URI. Bef Once the training job begins, the training container will look at the offline input ``code`` channel to install dependencies and run the entry script. This isolates the training container, so no inbound or outbound network calls can be made. - -FAQ ---- - -I want to train a SageMaker Estimator with local data, how do I do this? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -Upload the data to S3 before training. You can use the AWS Command Line Tool (the aws cli) to achieve this. - -If you don't have the aws cli, you can install it using pip: - -:: - - pip install awscli --upgrade --user - -If you don't have pip or want to learn more about installing the aws cli, see the official `Amazon aws cli installation guide `__. - -After you install the AWS cli, you can upload a directory of files to S3 with the following command: - -:: - - aws s3 cp /tmp/foo/ s3://bucket/path - -For more information about using the aws cli for manipulating S3 resources, see `AWS cli command reference `__. - - -How do I make predictions against an existing endpoint? -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Create a ``Predictor`` object and provide it with your endpoint name, -then call its ``predict()`` method with your input. - -You can use either the generic ``RealTimePredictor`` class, which by default does not perform any serialization/deserialization transformations on your input, -but can be configured to do so through constructor arguments: -http://sagemaker.readthedocs.io/en/stable/predictors.html - -Or you can use the TensorFlow / MXNet specific predictor classes, which have default serialization/deserialization logic: -http://sagemaker.readthedocs.io/en/stable/sagemaker.tensorflow.html#tensorflow-predictor -http://sagemaker.readthedocs.io/en/stable/sagemaker.mxnet.html#mxnet-predictor - -Example code using the TensorFlow predictor: - -:: - - from sagemaker.tensorflow import TensorFlowPredictor - - predictor = TensorFlowPredictor('myexistingendpoint') - result = predictor.predict(['my request body']) - - +********* BYO Model ---------- +********* + You can also create an endpoint from an existing model rather than training one. That is, you can bring your own model: @@ -725,9 +801,14 @@ This returns a predictor the same way an ``Estimator`` does when ``deploy()`` is A full example is available in the `Amazon SageMaker examples repository `__. +You can also find this notebook in the **Advanced Functionality** section of the **SageMaker Examples** section in a notebook instance. +For information about using sample notebooks in a SageMaker notebook instance, see `Use Example Notebooks `__ +in the AWS documentation. +******************* Inference Pipelines -------------------- +******************* + You can create a Pipeline for realtime or batch inference comprising of one or multiple model containers. This will help you to deploy an ML pipeline behind a single endpoint and you can have one API call perform pre-processing, model-scoring and post-processing on your data before returning it back as the response. @@ -751,6 +832,10 @@ For more information about how to train an XGBoost model, please refer to the XG .. _here: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/xgboost_abalone/xgboost_abalone.ipynb +You can also find this notebook in the **Introduction to Amazon Algorithms** section of the **SageMaker Examples** section in a notebook instance. +For information about using sample notebooks in a SageMaker notebook instance, see `Use Example Notebooks `__ +in the AWS documentation. + .. code:: python sm_model.deploy(initial_instance_count=1, instance_type='ml.c5.xlarge', endpoint_name=endpoint_name) @@ -793,11 +878,65 @@ For comprehensive examples on how to use Inference Pipelines please refer to the - `inference_pipeline_sparkml_xgboost_abalone.ipynb `__ - `inference_pipeline_sparkml_blazingtext_dbpedia.ipynb `__ +You can also find these notebooks in the **Advanced Functionality** section of the **SageMaker Examples** section in a notebook instance. +For information about using sample notebooks in a SageMaker notebook instance, see `Use Example Notebooks `__ +in the AWS documentation. + +****************** SageMaker Workflow ------------------- +****************** You can use Apache Airflow to author, schedule and monitor SageMaker workflow. For more information, see `SageMaker Workflow in Apache Airflow`_. .. _SageMaker Workflow in Apache Airflow: https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/workflow/README.rst + +*** +FAQ +*** + +I want to train a SageMaker Estimator with local data, how do I do this? +======================================================================== + +Upload the data to S3 before training. You can use the AWS Command Line Tool (the aws cli) to achieve this. + +If you don't have the aws cli, you can install it using pip: + +:: + + pip install awscli --upgrade --user + +If you don't have pip or want to learn more about installing the aws cli, see the official `Amazon aws cli installation guide `__. + +After you install the AWS cli, you can upload a directory of files to S3 with the following command: + +:: + + aws s3 cp /tmp/foo/ s3://bucket/path + +For more information about using the aws cli for manipulating S3 resources, see `AWS cli command reference `__. + + +How do I make predictions against an existing endpoint? +======================================================= + +Create a ``Predictor`` object and provide it with your endpoint name, +then call its ``predict()`` method with your input. + +You can use either the generic ``RealTimePredictor`` class, which by default does not perform any serialization/deserialization transformations on your input, +but can be configured to do so through constructor arguments: +http://sagemaker.readthedocs.io/en/stable/predictors.html + +Or you can use the TensorFlow / MXNet specific predictor classes, which have default serialization/deserialization logic: +http://sagemaker.readthedocs.io/en/stable/sagemaker.tensorflow.html#tensorflow-predictor +http://sagemaker.readthedocs.io/en/stable/sagemaker.mxnet.html#mxnet-predictor + +Example code using the TensorFlow predictor: + +:: + + from sagemaker.tensorflow import TensorFlowPredictor + + predictor = TensorFlowPredictor('myexistingendpoint') + result = predictor.predict(['my request body'])