|
| 1 | +========================================================== |
| 2 | +Using Reinforcement Learning with the SageMaker Python SDK |
| 3 | +========================================================== |
| 4 | + |
| 5 | +.. contents:: |
| 6 | + |
| 7 | +With Reinforcement Learning (RL) Estimators, you can train reinforcement learning models on Amazon SageMaker. |
| 8 | + |
| 9 | +Supported versions of Coach: ``0.11.1``, ``0.10.1`` with TensorFlow, ``0.11.0`` with TensorFlow or MXNet. |
| 10 | +For more information about Coach, see https://github.com/NervanaSystems/coach |
| 11 | + |
| 12 | +Supported versions of Ray: ``0.5.3`` with TensorFlow. |
| 13 | +For more information about Ray, see https://github.com/ray-project/ray |
| 14 | + |
| 15 | +RL Training |
| 16 | +----------- |
| 17 | + |
| 18 | +Training RL models using ``RLEstimator`` is a two-step process: |
| 19 | + |
| 20 | +1. Prepare a training script to run on SageMaker |
| 21 | +2. Run this script on SageMaker via an ``RlEstimator``. |
| 22 | + |
| 23 | +You should prepare your script in a separate source file than the notebook, terminal session, or source file you're |
| 24 | +using to submit the script to SageMaker via an ``RlEstimator``. This will be discussed in further detail below. |
| 25 | + |
| 26 | +Suppose that you already have a training script called ``coach-train.py``. |
| 27 | +You can then create an ``RLEstimator`` with keyword arguments to point to this script and define how SageMaker runs it: |
| 28 | + |
| 29 | +.. code:: python |
| 30 | +
|
| 31 | + from sagemaker.rl import RLEstimator, RLToolkit, RLFramework |
| 32 | +
|
| 33 | + rl_estimator = RLEstimator(entry_point='coach-train.py', |
| 34 | + toolkit=RLToolkit.COACH, |
| 35 | + toolkit_version='0.11.1', |
| 36 | + framework=RLFramework.TENSORFLOW, |
| 37 | + role='SageMakerRole', |
| 38 | + train_instance_type='ml.p3.2xlarge', |
| 39 | + train_instance_count=1) |
| 40 | +
|
| 41 | +After that, you simply tell the estimator to start a training job: |
| 42 | + |
| 43 | +.. code:: python |
| 44 | +
|
| 45 | + rl_estimator.fit() |
| 46 | +
|
| 47 | +In the following sections, we'll discuss how to prepare a training script for execution on SageMaker |
| 48 | +and how to run that script on SageMaker using ``RLEstimator``. |
| 49 | + |
| 50 | + |
| 51 | +Preparing the RL Training Script |
| 52 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 53 | + |
| 54 | +Your RL training script must be a Python 3.5 compatible source file from MXNet framework or Python 3.6 for TensorFlow. |
| 55 | + |
| 56 | +The training script is very similar to a training script you might run outside of SageMaker, but you |
| 57 | +can access useful properties about the training environment through various environment variables, such as |
| 58 | + |
| 59 | +* ``SM_MODEL_DIR``: A string representing the path to the directory to write model artifacts to. |
| 60 | + These artifacts are uploaded to S3 for model hosting. |
| 61 | +* ``SM_NUM_GPUS``: An integer representing the number of GPUs available to the host. |
| 62 | +* ``SM_OUTPUT_DATA_DIR``: A string representing the filesystem path to write output artifacts to. Output artifacts may |
| 63 | + include checkpoints, graphs, and other files to save, not including model artifacts. These artifacts are compressed |
| 64 | + and uploaded to S3 to the same S3 prefix as the model artifacts. |
| 65 | + |
| 66 | +For the exhaustive list of available environment variables, see the |
| 67 | +`SageMaker Containers documentation <https://github.com/aws/sagemaker-containers#list-of-provided-environment-variables-by-sagemaker-containers>`__. |
| 68 | + |
| 69 | + |
| 70 | +RL Estimators |
| 71 | +------------- |
| 72 | + |
| 73 | +The ``RLEstimator`` constructor takes both required and optional arguments. |
| 74 | + |
| 75 | +Required arguments |
| 76 | +~~~~~~~~~~~~~~~~~~ |
| 77 | + |
| 78 | +The following are required arguments to the ``RLEstimator`` constructor. When you create an instance of RLEstimator, you must include |
| 79 | +these in the constructor, either positionally or as keyword arguments. |
| 80 | + |
| 81 | +- ``entry_point`` Path (absolute or relative) to the Python file which |
| 82 | + should be executed as the entry point to training. |
| 83 | +- ``role`` An AWS IAM role (either name or full ARN). The Amazon |
| 84 | + SageMaker training jobs and APIs that create Amazon SageMaker |
| 85 | + endpoints use this role to access training data and model artifacts. |
| 86 | + After the endpoint is created, the inference code might use the IAM |
| 87 | + role, if accessing AWS resource. |
| 88 | +- ``train_instance_count`` Number of Amazon EC2 instances to use for |
| 89 | + training. |
| 90 | +- ``train_instance_type`` Type of EC2 instance to use for training, for |
| 91 | + example, 'ml.m4.xlarge'. |
| 92 | + |
| 93 | +You must as well include either: |
| 94 | + |
| 95 | +- ``toolkit`` RL toolkit (Ray RLlib or Coach) you want to use for executing your model training code. |
| 96 | + |
| 97 | +- ``toolkit_version`` RL toolkit version you want to be use for executing your model training code. |
| 98 | + |
| 99 | +- ``framework`` Framework (MXNet or TensorFlow) you want to be used as |
| 100 | + a toolkit backed for reinforcement learning training. |
| 101 | + |
| 102 | +or provide: |
| 103 | + |
| 104 | +- ``image_name`` An alternative docker image to use for training and |
| 105 | + serving. If specified, the estimator will use this image for training and |
| 106 | + hosting, instead of selecting the appropriate SageMaker official image based on |
| 107 | + framework_version and py_version. Refer to: `SageMaker RL Docker Containers |
| 108 | + <#sagemaker-rl-docker-containers>`_ for details on what the Official images support |
| 109 | + and where to find the source code to build your custom image. |
| 110 | + |
| 111 | + |
| 112 | +Optional arguments |
| 113 | +~~~~~~~~~~~~~~~~~~ |
| 114 | + |
| 115 | +The following are optional arguments. When you create an ``RlEstimator`` object, you can specify these as keyword arguments. |
| 116 | + |
| 117 | +- ``source_dir`` Path (absolute or relative) to a directory with any |
| 118 | + other training source code dependencies including the entry point |
| 119 | + file. Structure within this directory will be preserved when training |
| 120 | + on SageMaker. |
| 121 | +- ``dependencies (list[str])`` A list of paths to directories (absolute or relative) with |
| 122 | + any additional libraries that will be exported to the container (default: ``[]``). |
| 123 | + The library folders will be copied to SageMaker in the same folder where the entrypoint is copied. |
| 124 | + If the ``source_dir`` points to S3, code will be uploaded and the S3 location will be used |
| 125 | + instead. |
| 126 | + |
| 127 | + For example, the following call: |
| 128 | + |
| 129 | + .. code:: python |
| 130 | +
|
| 131 | + >>> RLEstimator(entry_point='train.py', |
| 132 | + toolkit=RLToolkit.COACH, |
| 133 | + toolkit_version='0.11.0', |
| 134 | + framework=RLFramework.TENSORFLOW, |
| 135 | + dependencies=['my/libs/common', 'virtual-env']) |
| 136 | +
|
| 137 | + results in the following inside the container: |
| 138 | + |
| 139 | + .. code:: bash |
| 140 | +
|
| 141 | + >>> $ ls |
| 142 | +
|
| 143 | + >>> opt/ml/code |
| 144 | + >>> ├── train.py |
| 145 | + >>> ├── common |
| 146 | + >>> └── virtual-env |
| 147 | +
|
| 148 | +- ``hyperparameters`` Hyperparameters that will be used for training. |
| 149 | + Will be made accessible as a ``dict[str, str]`` to the training code on |
| 150 | + SageMaker. For convenience, accepts other types besides strings, but |
| 151 | + ``str`` will be called on keys and values to convert them before |
| 152 | + training. |
| 153 | +- ``train_volume_size`` Size in GB of the EBS volume to use for storing |
| 154 | + input data during training. Must be large enough to store training |
| 155 | + data if ``input_mode='File'`` is used (which is the default). |
| 156 | +- ``train_max_run`` Timeout in seconds for training, after which Amazon |
| 157 | + SageMaker terminates the job regardless of its current status. |
| 158 | +- ``input_mode`` The input mode that the algorithm supports. Valid |
| 159 | + modes: 'File' - Amazon SageMaker copies the training dataset from the |
| 160 | + S3 location to a directory in the Docker container. 'Pipe' - Amazon |
| 161 | + SageMaker streams data directly from S3 to the container via a Unix |
| 162 | + named pipe. |
| 163 | +- ``output_path`` S3 location where you want the training result (model |
| 164 | + artifacts and optional output files) saved. If not specified, results |
| 165 | + are stored to a default bucket. If the bucket with the specific name |
| 166 | + does not exist, the estimator creates the bucket during the ``fit`` |
| 167 | + method execution. |
| 168 | +- ``output_kms_key`` Optional KMS key ID to optionally encrypt training |
| 169 | + output with. |
| 170 | +- ``job_name`` Name to assign for the training job that the ``fit``` |
| 171 | + method launches. If not specified, the estimator generates a default |
| 172 | + job name, based on the training image name and current timestamp |
| 173 | + |
| 174 | +Calling fit |
| 175 | +~~~~~~~~~~~ |
| 176 | + |
| 177 | +You start your training script by calling ``fit`` on an ``RLEstimator``. ``fit`` takes both a few optional |
| 178 | +arguments. |
| 179 | + |
| 180 | +Optional arguments |
| 181 | +'''''''''''''''''' |
| 182 | + |
| 183 | +- ``inputs``: This can take one of the following forms: A string |
| 184 | + S3 URI, for example ``s3://my-bucket/my-training-data``. In this |
| 185 | + case, the S3 objects rooted at the ``my-training-data`` prefix will |
| 186 | + be available in the default ``train`` channel. A dict from |
| 187 | + string channel names to S3 URIs. In this case, the objects rooted at |
| 188 | + each S3 prefix will available as files in each channel directory. |
| 189 | +- ``wait``: Defaults to True, whether to block and wait for the |
| 190 | + training script to complete before returning. |
| 191 | +- ``logs``: Defaults to True, whether to show logs produced by training |
| 192 | + job in the Python session. Only meaningful when wait is True. |
| 193 | + |
| 194 | + |
| 195 | +Distributed RL Training |
| 196 | +----------------------- |
| 197 | + |
| 198 | +Amazon SageMaker RL supports multi-core and multi-instance distributed training. |
| 199 | +Depending on your use case, training and/or environment rollout can be distributed. |
| 200 | + |
| 201 | +Please see the `Amazon SageMaker examples <https://github.com/awslabs/amazon-sagemaker-examples/tree/master/reinforcement_learning>`_ |
| 202 | +on how it can be done using different RL toolkits. |
| 203 | + |
| 204 | + |
| 205 | +Saving models |
| 206 | +------------- |
| 207 | + |
| 208 | +In order to save your trained PyTorch model for deployment on SageMaker, your training script should save your model |
| 209 | +to a certain filesystem path ``/opt/ml/model``. This value is also accessible through the environment variable |
| 210 | +``SM_MODEL_DIR``. |
| 211 | + |
| 212 | +Deploying RL Models |
| 213 | +------------------- |
| 214 | + |
| 215 | +After an RL Estimator has been fit, you can host the newly created model in SageMaker. |
| 216 | + |
| 217 | +After calling ``fit``, you can call ``deploy`` on an ``RlEstimator`` Estimator to create a SageMaker Endpoint. |
| 218 | +The Endpoint runs one of the SageMaker-provided model server based on the ``framework`` parameter |
| 219 | +specified in the ``RLEstimator`` constructor and hosts the model produced by your training script, |
| 220 | +which was run when you called ``fit``. This was the model you saved to ``model_dir``. |
| 221 | +In case if ``image_name`` was specified it would use provided image for the deployment. |
| 222 | + |
| 223 | +``deploy`` returns a ``sagemaker.mxnet.MXNetPredictor`` for MXNet or |
| 224 | +``sagemaker.tensorflow.serving.Predictor`` for TensorFlow. |
| 225 | + |
| 226 | +``predict`` returns the result of inference against your model. |
| 227 | + |
| 228 | +.. code:: python |
| 229 | +
|
| 230 | + # Train my estimator |
| 231 | + rl_estimator = RLEstimator(entry_point='coach-train.py', |
| 232 | + toolkit=RLToolkit.COACH, |
| 233 | + toolkit_version='0.11.0', |
| 234 | + framework=RLFramework.MXNET, |
| 235 | + role='SageMakerRole', |
| 236 | + train_instance_type='ml.c4.2xlarge', |
| 237 | + train_instance_count=1) |
| 238 | +
|
| 239 | + rl_estimator.fit() |
| 240 | +
|
| 241 | + # Deploy my estimator to a SageMaker Endpoint and get a MXNetPredictor |
| 242 | + predictor = rl_estimator.deploy(instance_type='ml.m4.xlarge', |
| 243 | + initial_instance_count=1) |
| 244 | +
|
| 245 | + response = predictor.predict(data) |
| 246 | +
|
| 247 | +For more information please see `The SageMaker MXNet Model Server <https://github.com/aws/sagemaker-python-sdk/tree/master/src/sagemaker/mxnet#the-sagemaker-mxnet-model-server>`_ |
| 248 | +and `Deploying to TensorFlow Serving Endpoints <https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/deploying_tensorflow_serving.rst>`_ documentation. |
| 249 | + |
| 250 | + |
| 251 | +Working with Existing Training Jobs |
| 252 | +----------------------------------- |
| 253 | + |
| 254 | +Attaching to existing training jobs |
| 255 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 256 | + |
| 257 | +You can attach an RL Estimator to an existing training job using the |
| 258 | +``attach`` method. |
| 259 | + |
| 260 | +.. code:: python |
| 261 | +
|
| 262 | + my_training_job_name = 'MyAwesomeRLTrainingJob' |
| 263 | + rl_estimator = RLEstimator.attach(my_training_job_name) |
| 264 | +
|
| 265 | +After attaching, if the training job has finished with job status "Completed", it can be |
| 266 | +``deploy``\ ed to create a SageMaker Endpoint and return a ``Predictor``. If the training job is in progress, |
| 267 | +attach will block and display log messages from the training job, until the training job completes. |
| 268 | + |
| 269 | +The ``attach`` method accepts the following arguments: |
| 270 | + |
| 271 | +- ``training_job_name:`` The name of the training job to attach |
| 272 | + to. |
| 273 | +- ``sagemaker_session:`` The Session used |
| 274 | + to interact with SageMaker |
| 275 | + |
| 276 | +RL Training Examples |
| 277 | +-------------------- |
| 278 | + |
| 279 | +Amazon provides several example Jupyter notebooks that demonstrate end-to-end training on Amazon SageMaker using RL. |
| 280 | +Please refer to: |
| 281 | + |
| 282 | +https://github.com/awslabs/amazon-sagemaker-examples/tree/master/reinforcement_learning |
| 283 | + |
| 284 | +These are also available in SageMaker Notebook Instance hosted Jupyter notebooks under the sample notebooks folder. |
| 285 | + |
| 286 | + |
| 287 | +SageMaker RL Docker Containers |
| 288 | +------------------------------ |
| 289 | + |
| 290 | +When training and deploying training scripts, SageMaker runs your Python script in a Docker container with several |
| 291 | +libraries installed. When creating the Estimator and calling deploy to create the SageMaker Endpoint, you can control |
| 292 | +the environment your script runs in. |
| 293 | + |
| 294 | +SageMaker runs RL Estimator scripts in either Python 3.5 for MXNet or Python 3.6 for TensorFlow. |
| 295 | + |
| 296 | +The Docker images have the following dependencies installed: |
| 297 | + |
| 298 | ++-------------------------+-------------------+-------------------+-------------------+ |
| 299 | +| Dependencies | Coach 0.10.1 | Coach 0.11.0 | Ray 0.5.3 | |
| 300 | ++-------------------------+-------------------+-------------------+-------------------+ |
| 301 | +| Python | 3.6 | 3.5(MXNet) or | 3.6 | |
| 302 | +| | | 3.6(TensorFlow) | | |
| 303 | ++-------------------------+-------------------+-------------------+-------------------+ |
| 304 | +| CUDA (GPU image only) | 9.0 | 9.0 | 9.0 | |
| 305 | ++-------------------------+-------------------+-------------------+-------------------+ |
| 306 | +| DL Framework | TensorFlow-1.11.0 | MXNet-1.3.0 or | TensorFlow-1.11.0 | |
| 307 | +| | | TensorFlow-1.11.0 | | |
| 308 | ++-------------------------+-------------------+-------------------+-------------------+ |
| 309 | +| gym | 0.10.5 | 0.10.5 | 0.10.5 | |
| 310 | ++-------------------------+-------------------+-------------------+-------------------+ |
| 311 | + |
| 312 | +The Docker images extend Ubuntu 16.04. |
| 313 | + |
| 314 | +You can select version of by passing a ``framework_version`` keyword arg to the RL Estimator constructor. |
| 315 | +Currently supported versions are listed in the above table. You can also set ``framework_version`` to only specify major and |
| 316 | +minor version, which will cause your training script to be run on the latest supported patch version of that minor |
| 317 | +version. |
| 318 | + |
| 319 | +Alternatively, you can build your own image by following the instructions in the SageMaker RL containers |
| 320 | +repository, and passing ``image_name`` to the RL Estimator constructor. |
| 321 | + |
| 322 | +You can visit `the SageMaker RL containers repository <https://github.com/aws/sagemaker-rl-container>`_. |
0 commit comments