From c2dc20b4b7dea72eef5b1a6fe938430212b5a748 Mon Sep 17 00:00:00 2001 From: Lauren Yu <6631887+laurenyu@users.noreply.github.com> Date: Wed, 12 Sep 2018 15:58:01 -0700 Subject: [PATCH 01/19] Start adding documentation about upcoming MXNet training script changes --- src/sagemaker/mxnet/README.rst | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/src/sagemaker/mxnet/README.rst b/src/sagemaker/mxnet/README.rst index 2f0dfa487a..ad91c32602 100644 --- a/src/sagemaker/mxnet/README.rst +++ b/src/sagemaker/mxnet/README.rst @@ -1,4 +1,3 @@ - ===================================== MXNet SageMaker Estimators and Models ===================================== @@ -31,6 +30,11 @@ In the following sections, we'll discuss how to prepare a training script for ex Preparing the MXNet training script ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +.. warning:: + This required structure for training scripts will be deprecated with the next major release of MXNet images. + The ``train`` function will no longer be required; instead the training script must be able to be run as a standalone script. + For more information, see `Updating your MXNet training script <#updating-your-mxnet-training-script>__` + Your MXNet training script must be a Python 2.7 or 3.5 compatible source file. The MXNet training script must contain a function ``train``, which SageMaker invokes to run training. You can include other functions as well, but it must contain a ``train`` function. When you run your script on SageMaker via the ``MXNet`` Estimator, SageMaker injects information about the training environment into your training function via Python keyword arguments. You can choose to take advantage of these by including them as keyword arguments in your train function. The full list of arguments is: @@ -574,6 +578,14 @@ https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker-pytho These are also available in SageMaker Notebook Instance hosted Jupyter notebooks under the "sample notebooks" folder. +Updating your MXNet training script +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The required structure for training scripts will be deprecated with the next major release of MXNet images. +The ``train`` function will no longer be required; instead the training script must be able to be run as a standalone script. +In this way, the training script will become similar to a training script you might run outside of SageMaker. + + SageMaker MXNet Containers ~~~~~~~~~~~~~~~~~~~~~~~~~~ From c52ad94fb97ab78d0feac58440f7dab7aa0d0695 Mon Sep 17 00:00:00 2001 From: Lauren Yu <6631887+laurenyu@users.noreply.github.com> Date: Wed, 12 Sep 2018 16:22:04 -0700 Subject: [PATCH 02/19] attempt to fix formatting --- src/sagemaker/mxnet/README.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/sagemaker/mxnet/README.rst b/src/sagemaker/mxnet/README.rst index ad91c32602..0af3faadd4 100644 --- a/src/sagemaker/mxnet/README.rst +++ b/src/sagemaker/mxnet/README.rst @@ -31,9 +31,10 @@ Preparing the MXNet training script ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. warning:: + This required structure for training scripts will be deprecated with the next major release of MXNet images. The ``train`` function will no longer be required; instead the training script must be able to be run as a standalone script. - For more information, see `Updating your MXNet training script <#updating-your-mxnet-training-script>__` + For more information, see later section. Your MXNet training script must be a Python 2.7 or 3.5 compatible source file. The MXNet training script must contain a function ``train``, which SageMaker invokes to run training. You can include other functions as well, but it must contain a ``train`` function. From 83245c567f09fbcca50ca988bb7a0a03556a0cc2 Mon Sep 17 00:00:00 2001 From: Lauren Yu <6631887+laurenyu@users.noreply.github.com> Date: Wed, 12 Sep 2018 16:30:23 -0700 Subject: [PATCH 03/19] attempt to fix formatting --- src/sagemaker/mxnet/README.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/sagemaker/mxnet/README.rst b/src/sagemaker/mxnet/README.rst index 0af3faadd4..35c3345032 100644 --- a/src/sagemaker/mxnet/README.rst +++ b/src/sagemaker/mxnet/README.rst @@ -30,11 +30,11 @@ In the following sections, we'll discuss how to prepare a training script for ex Preparing the MXNet training script ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -.. warning:: +.. attention:: This required structure for training scripts will be deprecated with the next major release of MXNet images. The ``train`` function will no longer be required; instead the training script must be able to be run as a standalone script. - For more information, see later section. + For more information, see `Updating your MXNet training script <#updating-your-mxnet-training-script>__` Your MXNet training script must be a Python 2.7 or 3.5 compatible source file. The MXNet training script must contain a function ``train``, which SageMaker invokes to run training. You can include other functions as well, but it must contain a ``train`` function. From 2764427716e89692a897ffcf2c72f749868f0dc5 Mon Sep 17 00:00:00 2001 From: Lauren Yu <6631887+laurenyu@users.noreply.github.com> Date: Wed, 12 Sep 2018 16:34:25 -0700 Subject: [PATCH 04/19] attempt to fix formatting --- src/sagemaker/mxnet/README.rst | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/src/sagemaker/mxnet/README.rst b/src/sagemaker/mxnet/README.rst index 35c3345032..5afdbf4e97 100644 --- a/src/sagemaker/mxnet/README.rst +++ b/src/sagemaker/mxnet/README.rst @@ -30,11 +30,13 @@ In the following sections, we'll discuss how to prepare a training script for ex Preparing the MXNet training script ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -.. attention:: - - This required structure for training scripts will be deprecated with the next major release of MXNet images. - The ``train`` function will no longer be required; instead the training script must be able to be run as a standalone script. - For more information, see `Updating your MXNet training script <#updating-your-mxnet-training-script>__` ++-------------------------------------------------------------------------------------------------------------------------------+ +| Warning | ++-------------------------------------------------------------------------------------------------------------------------------+ +| This required structure for training scripts will be deprecated with the next major release of MXNet images. | +| The ``train`` function will no longer be required; instead the training script must be able to be run as a standalone script. | +| For more information, see `Updating your MXNet training script <#updating-your-mxnet-training-script>__` | ++-------------------------------------------------------------------------------------------------------------------------------+ Your MXNet training script must be a Python 2.7 or 3.5 compatible source file. The MXNet training script must contain a function ``train``, which SageMaker invokes to run training. You can include other functions as well, but it must contain a ``train`` function. From ffe9bfef1682ea8662c8bc134136e6ffc74f924f Mon Sep 17 00:00:00 2001 From: Lauren Yu <6631887+laurenyu@users.noreply.github.com> Date: Wed, 12 Sep 2018 16:35:24 -0700 Subject: [PATCH 05/19] attempt to fix formatting --- src/sagemaker/mxnet/README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/sagemaker/mxnet/README.rst b/src/sagemaker/mxnet/README.rst index 5afdbf4e97..b8ee5279c5 100644 --- a/src/sagemaker/mxnet/README.rst +++ b/src/sagemaker/mxnet/README.rst @@ -31,7 +31,7 @@ Preparing the MXNet training script ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +-------------------------------------------------------------------------------------------------------------------------------+ -| Warning | +| **WARNING** | +-------------------------------------------------------------------------------------------------------------------------------+ | This required structure for training scripts will be deprecated with the next major release of MXNet images. | | The ``train`` function will no longer be required; instead the training script must be able to be run as a standalone script. | From 891bf59fc837d58ce0770ba12ab31b4b1cbf2c9d Mon Sep 17 00:00:00 2001 From: Lauren Yu <6631887+laurenyu@users.noreply.github.com> Date: Wed, 12 Sep 2018 16:37:49 -0700 Subject: [PATCH 06/19] attempt to fix formatting --- src/sagemaker/mxnet/README.rst | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/src/sagemaker/mxnet/README.rst b/src/sagemaker/mxnet/README.rst index b8ee5279c5..727306dd76 100644 --- a/src/sagemaker/mxnet/README.rst +++ b/src/sagemaker/mxnet/README.rst @@ -30,13 +30,11 @@ In the following sections, we'll discuss how to prepare a training script for ex Preparing the MXNet training script ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -+-------------------------------------------------------------------------------------------------------------------------------+ -| **WARNING** | -+-------------------------------------------------------------------------------------------------------------------------------+ -| This required structure for training scripts will be deprecated with the next major release of MXNet images. | -| The ``train`` function will no longer be required; instead the training script must be able to be run as a standalone script. | -| For more information, see `Updating your MXNet training script <#updating-your-mxnet-training-script>__` | -+-------------------------------------------------------------------------------------------------------------------------------+ +=================================================================================================================================================================================================================================================================================================================================================== +WARNING +=================================================================================================================================================================================================================================================================================================================================================== +This required structure for training scripts will be deprecated with the next major release of MXNet images. The ``train`` function will no longer be required; instead the training script must be able to be run as a standalone script. For more information, see `Updating your MXNet training script <#updating-your-mxnet-training-script>__` +=================================================================================================================================================================================================================================================================================================================================================== Your MXNet training script must be a Python 2.7 or 3.5 compatible source file. The MXNet training script must contain a function ``train``, which SageMaker invokes to run training. You can include other functions as well, but it must contain a ``train`` function. From 66b9572e23eb9da29843b92ba29803bf5977a566 Mon Sep 17 00:00:00 2001 From: Lauren Yu <6631887+laurenyu@users.noreply.github.com> Date: Wed, 12 Sep 2018 16:38:41 -0700 Subject: [PATCH 07/19] Revert "attempt to fix formatting" This reverts commit 891bf59fc837d58ce0770ba12ab31b4b1cbf2c9d. --- src/sagemaker/mxnet/README.rst | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/src/sagemaker/mxnet/README.rst b/src/sagemaker/mxnet/README.rst index 727306dd76..b8ee5279c5 100644 --- a/src/sagemaker/mxnet/README.rst +++ b/src/sagemaker/mxnet/README.rst @@ -30,11 +30,13 @@ In the following sections, we'll discuss how to prepare a training script for ex Preparing the MXNet training script ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -=================================================================================================================================================================================================================================================================================================================================================== -WARNING -=================================================================================================================================================================================================================================================================================================================================================== -This required structure for training scripts will be deprecated with the next major release of MXNet images. The ``train`` function will no longer be required; instead the training script must be able to be run as a standalone script. For more information, see `Updating your MXNet training script <#updating-your-mxnet-training-script>__` -=================================================================================================================================================================================================================================================================================================================================================== ++-------------------------------------------------------------------------------------------------------------------------------+ +| **WARNING** | ++-------------------------------------------------------------------------------------------------------------------------------+ +| This required structure for training scripts will be deprecated with the next major release of MXNet images. | +| The ``train`` function will no longer be required; instead the training script must be able to be run as a standalone script. | +| For more information, see `Updating your MXNet training script <#updating-your-mxnet-training-script>__` | ++-------------------------------------------------------------------------------------------------------------------------------+ Your MXNet training script must be a Python 2.7 or 3.5 compatible source file. The MXNet training script must contain a function ``train``, which SageMaker invokes to run training. You can include other functions as well, but it must contain a ``train`` function. From 11dccfd05b11d56c14e7090e72623a7f12f93d63 Mon Sep 17 00:00:00 2001 From: Lauren Yu <6631887+laurenyu@users.noreply.github.com> Date: Wed, 12 Sep 2018 16:39:36 -0700 Subject: [PATCH 08/19] fix link --- src/sagemaker/mxnet/README.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/sagemaker/mxnet/README.rst b/src/sagemaker/mxnet/README.rst index b8ee5279c5..480b93fc37 100644 --- a/src/sagemaker/mxnet/README.rst +++ b/src/sagemaker/mxnet/README.rst @@ -35,7 +35,7 @@ Preparing the MXNet training script +-------------------------------------------------------------------------------------------------------------------------------+ | This required structure for training scripts will be deprecated with the next major release of MXNet images. | | The ``train`` function will no longer be required; instead the training script must be able to be run as a standalone script. | -| For more information, see `Updating your MXNet training script <#updating-your-mxnet-training-script>__` | +| For more information, see `Updating your MXNet training script <#updating-your-mxnet-training-script>`__ | +-------------------------------------------------------------------------------------------------------------------------------+ Your MXNet training script must be a Python 2.7 or 3.5 compatible source file. The MXNet training script must contain a function ``train``, which SageMaker invokes to run training. You can include other functions as well, but it must contain a ``train`` function. From 5f4a303c79ef0dd3053d3ff19ce061bbc9bb1532 Mon Sep 17 00:00:00 2001 From: Lauren Yu <6631887+laurenyu@users.noreply.github.com> Date: Thu, 13 Sep 2018 09:39:53 -0700 Subject: [PATCH 09/19] begin writing instructions --- src/sagemaker/mxnet/README.rst | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/src/sagemaker/mxnet/README.rst b/src/sagemaker/mxnet/README.rst index 480b93fc37..839966ad92 100644 --- a/src/sagemaker/mxnet/README.rst +++ b/src/sagemaker/mxnet/README.rst @@ -35,7 +35,7 @@ Preparing the MXNet training script +-------------------------------------------------------------------------------------------------------------------------------+ | This required structure for training scripts will be deprecated with the next major release of MXNet images. | | The ``train`` function will no longer be required; instead the training script must be able to be run as a standalone script. | -| For more information, see `Updating your MXNet training script <#updating-your-mxnet-training-script>`__ | +| For more information, see `Updating your MXNet training script <#updating-your-mxnet-training-script>`__. | +-------------------------------------------------------------------------------------------------------------------------------+ Your MXNet training script must be a Python 2.7 or 3.5 compatible source file. The MXNet training script must contain a function ``train``, which SageMaker invokes to run training. You can include other functions as well, but it must contain a ``train`` function. @@ -588,6 +588,18 @@ The required structure for training scripts will be deprecated with the next maj The ``train`` function will no longer be required; instead the training script must be able to be run as a standalone script. In this way, the training script will become similar to a training script you might run outside of SageMaker. +There are a few steps needed to make a training script with the old format compatible with the new format. +You don't need to do this yet, but it's documented here for future reference. + +First, add a `main guard `__ (``if __name__ == '__main__':``). +The code executed from your main guard needs to: + +1. Set hyperparameters and other variables +2. Initiate training +3. Save the model + +Hyperparameters will now be passed as command-line arguments to your training script. + SageMaker MXNet Containers ~~~~~~~~~~~~~~~~~~~~~~~~~~ From 518760bea474f0d92a808806554cbff00b46f04b Mon Sep 17 00:00:00 2001 From: Lauren Yu <6631887+laurenyu@users.noreply.github.com> Date: Thu, 13 Sep 2018 09:41:50 -0700 Subject: [PATCH 10/19] attempt to fix formatting --- src/sagemaker/mxnet/README.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/sagemaker/mxnet/README.rst b/src/sagemaker/mxnet/README.rst index 839966ad92..716496141e 100644 --- a/src/sagemaker/mxnet/README.rst +++ b/src/sagemaker/mxnet/README.rst @@ -31,7 +31,8 @@ Preparing the MXNet training script ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +-------------------------------------------------------------------------------------------------------------------------------+ -| **WARNING** | +| .. class:: center | +| **WARNING** | +-------------------------------------------------------------------------------------------------------------------------------+ | This required structure for training scripts will be deprecated with the next major release of MXNet images. | | The ``train`` function will no longer be required; instead the training script must be able to be run as a standalone script. | From 614cdcc0158783981e3e599b30a4916b27324de1 Mon Sep 17 00:00:00 2001 From: Lauren Yu <6631887+laurenyu@users.noreply.github.com> Date: Thu, 13 Sep 2018 09:50:19 -0700 Subject: [PATCH 11/19] attempt to fix formatting --- src/sagemaker/mxnet/README.rst | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/sagemaker/mxnet/README.rst b/src/sagemaker/mxnet/README.rst index 716496141e..cff6dfd90a 100644 --- a/src/sagemaker/mxnet/README.rst +++ b/src/sagemaker/mxnet/README.rst @@ -31,9 +31,8 @@ Preparing the MXNet training script ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +-------------------------------------------------------------------------------------------------------------------------------+ -| .. class:: center | -| **WARNING** | -+-------------------------------------------------------------------------------------------------------------------------------+ +| WARNING | ++===============================================================================================================================+ | This required structure for training scripts will be deprecated with the next major release of MXNet images. | | The ``train`` function will no longer be required; instead the training script must be able to be run as a standalone script. | | For more information, see `Updating your MXNet training script <#updating-your-mxnet-training-script>`__. | From a16ad86f9377a7c668cf12d24a02716790532c4d Mon Sep 17 00:00:00 2001 From: Lauren Yu <6631887+laurenyu@users.noreply.github.com> Date: Thu, 13 Sep 2018 09:51:08 -0700 Subject: [PATCH 12/19] attempt to fix formatting --- src/sagemaker/mxnet/README.rst | 1 + 1 file changed, 1 insertion(+) diff --git a/src/sagemaker/mxnet/README.rst b/src/sagemaker/mxnet/README.rst index cff6dfd90a..2816e641c6 100644 --- a/src/sagemaker/mxnet/README.rst +++ b/src/sagemaker/mxnet/README.rst @@ -33,6 +33,7 @@ Preparing the MXNet training script +-------------------------------------------------------------------------------------------------------------------------------+ | WARNING | +===============================================================================================================================+ ++-------------------------------------------------------------------------------------------------------------------------------+ | This required structure for training scripts will be deprecated with the next major release of MXNet images. | | The ``train`` function will no longer be required; instead the training script must be able to be run as a standalone script. | | For more information, see `Updating your MXNet training script <#updating-your-mxnet-training-script>`__. | From c71d44f2d2a57ce6089c8ee0a4a52a41d8d57513 Mon Sep 17 00:00:00 2001 From: Lauren Yu <6631887+laurenyu@users.noreply.github.com> Date: Thu, 13 Sep 2018 09:51:35 -0700 Subject: [PATCH 13/19] attempt to fix formatting --- src/sagemaker/mxnet/README.rst | 1 - 1 file changed, 1 deletion(-) diff --git a/src/sagemaker/mxnet/README.rst b/src/sagemaker/mxnet/README.rst index 2816e641c6..cff6dfd90a 100644 --- a/src/sagemaker/mxnet/README.rst +++ b/src/sagemaker/mxnet/README.rst @@ -33,7 +33,6 @@ Preparing the MXNet training script +-------------------------------------------------------------------------------------------------------------------------------+ | WARNING | +===============================================================================================================================+ -+-------------------------------------------------------------------------------------------------------------------------------+ | This required structure for training scripts will be deprecated with the next major release of MXNet images. | | The ``train`` function will no longer be required; instead the training script must be able to be run as a standalone script. | | For more information, see `Updating your MXNet training script <#updating-your-mxnet-training-script>`__. | From e2eb318a7a5d1eeeb655145ca83ac477b72dd48d Mon Sep 17 00:00:00 2001 From: Lauren Yu <6631887+laurenyu@users.noreply.github.com> Date: Thu, 13 Sep 2018 10:37:52 -0700 Subject: [PATCH 14/19] Continue writing --- src/sagemaker/mxnet/README.rst | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/src/sagemaker/mxnet/README.rst b/src/sagemaker/mxnet/README.rst index cff6dfd90a..24c6b15cb6 100644 --- a/src/sagemaker/mxnet/README.rst +++ b/src/sagemaker/mxnet/README.rst @@ -599,6 +599,39 @@ The code executed from your main guard needs to: 3. Save the model Hyperparameters will now be passed as command-line arguments to your training script. +We recommend using an `argument parser `__ to aid with this. +Using the ``argparse`` library as an example, this part of the code would look something like this: + +.. code:: python + parser = argparse.ArgumentParser() + + # hyperparameters sent by the client are passed as command-line arguments to the script. + parser.add_argument('--epochs', type=int, default=10) + parser.add_argument('--batch-size', type=int, default=100) + parser.add_argument('--learning-rate', type=float, default=0.1) + + # data, model, and output directories + parser.add_argument('--output-data-dir', type=str, default='opt/ml/output/data') + parser.add_argument('--model-dir', type=str, default='opt/ml/model') + parser.add_argument('--train', type=str, default='opt/ml/input/data/train') + parser.add_argument('--test', type=str, default='opt/ml/input/data/test') + + args, _ = parser.parse_known_args() + +The code in the main guard should also take care of training and saving the model. +(This can be as simple as just calling the methods used with the previous training script format.) +Note now that saving the model will not be done by default; this must be done by the training script. +If you were previously relying on the default save method, here is one you can copy into your code: + +.. code:: python + def save(model_dir, model): + model.symbol.save(os.path.join(model_dir, 'model-symbol.json')) + model.save_params(os.path.join(model_dir, 'model-0000.params')) + + signature = [{'name': data_desc.name, 'shape': [dim for dim in data_desc.shape]} + for data_desc in model.data_shapes] + with open(os.path.join(model_dir, 'model-shapes.json'), 'w') as f: + json.dump(signature, f) SageMaker MXNet Containers From d790884a7d42af2dc4dcdbc8ed42eb1b79ec3845 Mon Sep 17 00:00:00 2001 From: Lauren Yu <6631887+laurenyu@users.noreply.github.com> Date: Thu, 13 Sep 2018 10:38:47 -0700 Subject: [PATCH 15/19] attempt to fix formatting --- src/sagemaker/mxnet/README.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/sagemaker/mxnet/README.rst b/src/sagemaker/mxnet/README.rst index 24c6b15cb6..3dca650dcd 100644 --- a/src/sagemaker/mxnet/README.rst +++ b/src/sagemaker/mxnet/README.rst @@ -603,6 +603,7 @@ We recommend using an `argument parser Date: Thu, 13 Sep 2018 10:45:01 -0700 Subject: [PATCH 16/19] Continue writing --- src/sagemaker/mxnet/README.rst | 33 +++++++++++++++++++++------------ 1 file changed, 21 insertions(+), 12 deletions(-) diff --git a/src/sagemaker/mxnet/README.rst b/src/sagemaker/mxnet/README.rst index 3dca650dcd..7e63ecf7c9 100644 --- a/src/sagemaker/mxnet/README.rst +++ b/src/sagemaker/mxnet/README.rst @@ -35,7 +35,7 @@ Preparing the MXNet training script +===============================================================================================================================+ | This required structure for training scripts will be deprecated with the next major release of MXNet images. | | The ``train`` function will no longer be required; instead the training script must be able to be run as a standalone script. | -| For more information, see `Updating your MXNet training script <#updating-your-mxnet-training-script>`__. | +| For more information, see `"Updating your MXNet training script" <#updating-your-mxnet-training-script>`__. | +-------------------------------------------------------------------------------------------------------------------------------+ Your MXNet training script must be a Python 2.7 or 3.5 compatible source file. The MXNet training script must contain a function ``train``, which SageMaker invokes to run training. You can include other functions as well, but it must contain a ``train`` function. @@ -604,20 +604,23 @@ Using the ``argparse`` library as an example, this part of the code would look s .. code:: python - parser = argparse.ArgumentParser() + import argparse - # hyperparameters sent by the client are passed as command-line arguments to the script. - parser.add_argument('--epochs', type=int, default=10) - parser.add_argument('--batch-size', type=int, default=100) - parser.add_argument('--learning-rate', type=float, default=0.1) + if __name__ == '__main__': + parser = argparse.ArgumentParser() - # data, model, and output directories - parser.add_argument('--output-data-dir', type=str, default='opt/ml/output/data') - parser.add_argument('--model-dir', type=str, default='opt/ml/model') - parser.add_argument('--train', type=str, default='opt/ml/input/data/train') - parser.add_argument('--test', type=str, default='opt/ml/input/data/test') + # hyperparameters sent by the client are passed as command-line arguments to the script. + parser.add_argument('--epochs', type=int, default=10) + parser.add_argument('--batch-size', type=int, default=100) + parser.add_argument('--learning-rate', type=float, default=0.1) - args, _ = parser.parse_known_args() + # data, model, and output directories + parser.add_argument('--output-data-dir', type=str, default='opt/ml/output/data') + parser.add_argument('--model-dir', type=str, default='opt/ml/model') + parser.add_argument('--train', type=str, default='opt/ml/input/data/train') + parser.add_argument('--test', type=str, default='opt/ml/input/data/test') + + args, _ = parser.parse_known_args() The code in the main guard should also take care of training and saving the model. (This can be as simple as just calling the methods used with the previous training script format.) @@ -626,6 +629,9 @@ If you were previously relying on the default save method, here is one you can c .. code:: python + import json + import os + def save(model_dir, model): model.symbol.save(os.path.join(model_dir, 'model-symbol.json')) model.save_params(os.path.join(model_dir, 'model-0000.params')) @@ -635,6 +641,9 @@ If you were previously relying on the default save method, here is one you can c with open(os.path.join(model_dir, 'model-shapes.json'), 'w') as f: json.dump(signature, f) +These changes will make training with MXNet similar to training with Chainer or PyTorch on SageMaker. +For more information about those experiences, see `"Preparing the Chainer training script" `__ and `"Preparing the PyTorch Training Script" `__. + SageMaker MXNet Containers ~~~~~~~~~~~~~~~~~~~~~~~~~~ From 0e6a49a636521b24d09f45bd8e4573b1498d9862 Mon Sep 17 00:00:00 2001 From: Lauren Yu <6631887+laurenyu@users.noreply.github.com> Date: Thu, 13 Sep 2018 10:57:29 -0700 Subject: [PATCH 17/19] tweak wording --- src/sagemaker/mxnet/README.rst | 27 ++++++++++++++++++--------- 1 file changed, 18 insertions(+), 9 deletions(-) diff --git a/src/sagemaker/mxnet/README.rst b/src/sagemaker/mxnet/README.rst index 7e63ecf7c9..25d88d66c3 100644 --- a/src/sagemaker/mxnet/README.rst +++ b/src/sagemaker/mxnet/README.rst @@ -589,18 +589,19 @@ The ``train`` function will no longer be required; instead the training script m In this way, the training script will become similar to a training script you might run outside of SageMaker. There are a few steps needed to make a training script with the old format compatible with the new format. -You don't need to do this yet, but it's documented here for future reference. +You don't need to do this yet, but it's documented here for future reference, as this change is coming soon. First, add a `main guard `__ (``if __name__ == '__main__':``). The code executed from your main guard needs to: -1. Set hyperparameters and other variables +1. Set hyperparameters and directory locations 2. Initiate training 3. Save the model -Hyperparameters will now be passed as command-line arguments to your training script. -We recommend using an `argument parser `__ to aid with this. -Using the ``argparse`` library as an example, this part of the code would look something like this: +Hyperparameters will be passed as command-line arguments to your training script. +In addition, the locations for finding input data and saving the model and output data will need to be defined. +We recommend using `an argument parser `__ for this part. +Using the ``argparse`` library as an example, the code would look something like this: .. code:: python @@ -614,8 +615,7 @@ Using the ``argparse`` library as an example, this part of the code would look s parser.add_argument('--batch-size', type=int, default=100) parser.add_argument('--learning-rate', type=float, default=0.1) - # data, model, and output directories - parser.add_argument('--output-data-dir', type=str, default='opt/ml/output/data') + # input data and model directories parser.add_argument('--model-dir', type=str, default='opt/ml/model') parser.add_argument('--train', type=str, default='opt/ml/input/data/train') parser.add_argument('--test', type=str, default='opt/ml/input/data/test') @@ -623,8 +623,17 @@ Using the ``argparse`` library as an example, this part of the code would look s args, _ = parser.parse_known_args() The code in the main guard should also take care of training and saving the model. -(This can be as simple as just calling the methods used with the previous training script format.) -Note now that saving the model will not be done by default; this must be done by the training script. +This can be as simple as just calling the methods used with the previous training script format: + +.. code:: python + + if __name__ == '__main__': + # arg parsing (shown above) goes here + + model = train(args.batch_size, args.epochs, args.learning_rate, args.train, args.test) + save(args.model_dir, model) + +Note that saving the model will no longer be done by default; this must be done by the training script. If you were previously relying on the default save method, here is one you can copy into your code: .. code:: python From 9695cd5a704f6e2a62b886b486d7c9831faf6a16 Mon Sep 17 00:00:00 2001 From: Lauren Yu <6631887+laurenyu@users.noreply.github.com> Date: Mon, 17 Sep 2018 10:45:19 -0700 Subject: [PATCH 18/19] add details about env variables --- src/sagemaker/mxnet/README.rst | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/src/sagemaker/mxnet/README.rst b/src/sagemaker/mxnet/README.rst index 25d88d66c3..6c82401a3d 100644 --- a/src/sagemaker/mxnet/README.rst +++ b/src/sagemaker/mxnet/README.rst @@ -599,13 +599,16 @@ The code executed from your main guard needs to: 3. Save the model Hyperparameters will be passed as command-line arguments to your training script. -In addition, the locations for finding input data and saving the model and output data will need to be defined. +In addition, the locations for finding input data and saving the model and output data will be provided as environment variables rather than as arguments to a function. +You can find the full list of available environment variables in the `SageMaker Containers README `__. + We recommend using `an argument parser `__ for this part. Using the ``argparse`` library as an example, the code would look something like this: .. code:: python import argparse + import os if __name__ == '__main__': parser = argparse.ArgumentParser() @@ -616,9 +619,9 @@ Using the ``argparse`` library as an example, the code would look something like parser.add_argument('--learning-rate', type=float, default=0.1) # input data and model directories - parser.add_argument('--model-dir', type=str, default='opt/ml/model') - parser.add_argument('--train', type=str, default='opt/ml/input/data/train') - parser.add_argument('--test', type=str, default='opt/ml/input/data/test') + parser.add_argument('--model-dir', type=str, default=os.environ['SM_MODEL_DIR']) + parser.add_argument('--train', type=str, default=os.environ['SM_CHANNEL_TRAIN']) + parser.add_argument('--test', type=str, default=os.environ['SM_CHANNEL_TEST']) args, _ = parser.parse_known_args() From 880a2380b1c2c4ce1da710ea5f69a52a2c839585 Mon Sep 17 00:00:00 2001 From: Lauren Yu <6631887+laurenyu@users.noreply.github.com> Date: Mon, 17 Sep 2018 17:23:00 -0700 Subject: [PATCH 19/19] tweak wording --- src/sagemaker/mxnet/README.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/sagemaker/mxnet/README.rst b/src/sagemaker/mxnet/README.rst index 6c82401a3d..2e4c45cd4d 100644 --- a/src/sagemaker/mxnet/README.rst +++ b/src/sagemaker/mxnet/README.rst @@ -599,7 +599,7 @@ The code executed from your main guard needs to: 3. Save the model Hyperparameters will be passed as command-line arguments to your training script. -In addition, the locations for finding input data and saving the model and output data will be provided as environment variables rather than as arguments to a function. +In addition, the container will define the locations of input data and where to save the model artifacts and output data as environment variables rather than passing that information as arguments to the ``train`` function. You can find the full list of available environment variables in the `SageMaker Containers README `__. We recommend using `an argument parser `__ for this part. @@ -626,7 +626,7 @@ Using the ``argparse`` library as an example, the code would look something like args, _ = parser.parse_known_args() The code in the main guard should also take care of training and saving the model. -This can be as simple as just calling the methods used with the previous training script format: +This can be as simple as just calling the ``train`` and ``save`` methods used in the previous training script format: .. code:: python