Merge pull request aws#232 from awslabs/arpin_local_mode_changes

djarpin · web-flow · commit 8b45bd5e735d · 2018-04-05T21:54:10.000-07:00
Updated: local mode notebooks based on comments from first PR
diff --git a/sagemaker-python-sdk/mxnet_gluon_mnist/mnist_with_gluon_local_mode.ipynb b/sagemaker-python-sdk/mxnet_gluon_mnist/mnist_with_gluon_local_mode.ipynb
@@ -8,9 +8,9 @@
     "\n",
     "### Pre-requisites\n",
     "\n",
-    "This notebook shows how to use the SageMaker Python SDK to run your code in a local container before deploying to SageMaker's managed training or hosting environments.  This can speed up iterative testing and debugging while using the same familiar Python SDK interface.  Just change your estimator's `train_instance_type` to `local` (or `local_gpu` if you're using an ml.p2 or ml.p3 notebook instance).\n",
+    "This notebook shows how to use the SageMaker Python SDK to run your code in a local container before deploying to SageMaker's managed training or hosting environments.  This can speed up iterative testing and debugging while using the same familiar Python SDK interface.  Just change your estimator's `train_instance_type` to `local`.  You could also use `local_gpu` if you're using an ml.p2 or ml.p3 notebook instance, but then you'll need to set `train_instance_count=1` since distributed, local, GPU training is not yet supported.\n",
     "\n",
-    "In order to use this feature you'll need to install docker-compose (and nvidia-docker if training with a GPU).\n",
+    "In order to use this feature you'll need to install docker-compose (and nvidia-docker if training with a GPU).  Running the setup.sh script below will handle this for you.\n",
     "\n",
     "**Note, you can only run a single local notebook at one time.**"
    ]
@@ -30,7 +30,7 @@
    "source": [
     "### Overview\n",
     "\n",
-    "MNIST is a widely used dataset for handwritten digit classification. It consists of 70,000 labeled 28x28 pixel grayscale images of hand-written digits. The dataset is split into 60,000 training images and 10,000 test images. There are 10 classes (one for each of the 10 digits). This tutorial will show how to train and test an MNIST model on SageMaker using MXNet and the Gluon API."
+    "MNIST is a widely used dataset for handwritten digit classification. It consists of 70,000 labeled 28x28 pixel grayscale images of hand-written digits. The dataset is split into 60,000 training images and 10,000 test images. There are 10 classes (one for each of the 10 digits). This tutorial will show how to train and test an MNIST model on SageMaker local mode using MXNet and the Gluon API."
    ]
   },
   {
@@ -121,7 +121,7 @@
    "source": [
     "## Run the training script on SageMaker\n",
     "\n",
-    "The ```MXNet``` class allows us to run our training function on SageMaker. We need to configure it with our training script, an IAM role, the number of training instances, and the training instance type.  This is the the only difference from [mnist_with_gluon.ipynb](./mnist_with_gluon.ipynb).  Instead of ``train_instance_type='ml.c4.xlarge'``, we set it to ``train_instance_type='local'``.  For local training with GPU, we could set this to \"local_gpu\".  In this case, `instance_type` was set above based on your whether you're running a GPU instance."
+    "The ```MXNet``` class allows us to run our training function on SageMaker local mode. We need to configure it with our training script, an IAM role, the number of training instances, and the training instance type.  This is the the only difference from [mnist_with_gluon.ipynb](./mnist_with_gluon.ipynb).  Instead of ``train_instance_type='ml.c4.xlarge'``, we set it to ``train_instance_type='local'``.  For local training with GPU, we could set this to \"local_gpu\".  In this case, `instance_type` was set above based on your whether you're running a GPU instance."
    ]
   },
   {
@@ -135,7 +135,7 @@
     "          train_instance_count=1, \n",
     "          train_instance_type=instance_type,\n",
     "          hyperparameters={'batch_size': 100, \n",
-    "                         'epochs': 2, \n",
+    "                         'epochs': 20, \n",
     "                         'learning_rate': 0.1, \n",
     "                         'momentum': 0.9, \n",
     "                         'log_interval': 100})"
diff --git a/sagemaker-python-sdk/tensorflow_distributed_mnist/tensorflow_local_mode_mnist.ipynb b/sagemaker-python-sdk/tensorflow_distributed_mnist/tensorflow_local_mode_mnist.ipynb
@@ -8,9 +8,9 @@
     "\n",
     "## Pre-requisites\n",
     "\n",
-    "This notebook shows how to use the SageMaker Python SDK to run your code in a local container before deploying to SageMaker's managed training or hosting environments.  This can speed up iterative testing and debugging while using the same familiar Python SDK interface.  Just change your estimator's `train_instance_type` to `local` (or `local_gpu` if you're using an ml.p2 or ml.p3 notebook instance).\n",
+    "This notebook shows how to use the SageMaker Python SDK to run your code in a local container before deploying to SageMaker's managed training or hosting environments.  This can speed up iterative testing and debugging while using the same familiar Python SDK interface.  Just change your estimator's `train_instance_type` to `local`.  You could also use `local_gpu` if you're using an ml.p2 or ml.p3 notebook instance, but then you'll need to set `train_instance_count=1` since distributed, local, GPU training is not yet supported.\n",
     "\n",
-    "In order to use this feature you'll need to install docker-compose (and nvidia-docker if training with a GPU).\n",
+    "In order to use this feature you'll need to install docker-compose (and nvidia-docker if training with a GPU).  Running the setup.sh script below will handle this for you.\n",
     "\n",
     "**Note, you can only run a single local notebook at one time.**"
    ]
@@ -30,9 +30,9 @@
    "source": [
     "## Overview\n",
     "\n",
-    "The **SageMaker Python SDK** helps you deploy your models for training and hosting in optimized, productions ready containers in SageMaker. The SageMaker Python SDK is easy to use, modular, extensible and compatible with TensorFlow and MXNet. This tutorial focuses on how to create a convolutional neural network model to train the [MNIST dataset](http://yann.lecun.com/exdb/mnist/) using **TensorFlow in local mode**.\n",
+    "The **SageMaker Python SDK** helps you deploy your models for training and hosting in optimized, productions ready containers in SageMaker local mode. The SageMaker Python SDK is easy to use, modular, extensible and compatible with TensorFlow and MXNet. This tutorial focuses on how to create a convolutional neural network model to train the [MNIST dataset](http://yann.lecun.com/exdb/mnist/) using **TensorFlow in local mode**.\n",
     "\n",
-    "### Set up the environment Set up the environment"
+    "### Set up the environment"
    ]
   },
   {
@@ -48,14 +48,6 @@
     "\n",
     "sagemaker_session = sagemaker.Session()\n",
     "\n",
-    "instance_type = 'local'\n",
-    "\n",
-    "if subprocess.call('nvidia-smi') == 0:\n",
-    "    ## Set type to GPU if one is present\n",
-    "    instance_type = 'local_gpu'\n",
-    "    \n",
-    "print(\"Instance type = \" + instance_type)\n",
-    "\n",
     "role = get_execution_role()"
    ]
   },
@@ -156,7 +148,7 @@
    "source": [
     "## Create a training job using the sagemaker.TensorFlow estimator\n",
     "\n",
-    "The `TensorFlow` class allows us to run our training function on SageMaker. We need to configure it with our training script, an IAM role, the number of training instances, and the training instance type.  Here is the the only difference from [tensorflow_distributed_mnist.ipynb](./tensorflow_distributed_mnist.ipynb) is that instead of ``train_instance_type='ml.c4.xlarge'``, we set it to ``train_instance_type='local'``.  For local training with GPU, we could set this to \"local_gpu\".  In this case, `instance_type` was set above based on your whether you're running a GPU instance.\n",
+    "The `TensorFlow` class allows us to run our training function on SageMaker. We need to configure it with our training script, an IAM role, the number of training instances, and the training instance type.  Here is the the only difference from [tensorflow_distributed_mnist.ipynb](./tensorflow_distributed_mnist.ipynb) is that instead of ``train_instance_type='ml.c4.xlarge'``, we set it to ``train_instance_type='local'``.  For local training with GPU, we could set this to `'local_gpu'` (but then need to set `train_instance_count=1`).  In this case, `instance_type` was set above based on your whether you're running a GPU instance.\n",
     "\n",
     "After we've constructed our `TensorFlow` object, we fit it using the data we uploaded to S3. Even though we're in local mode, using S3 as our data source makes sense because it maintains consistency with how SageMaker's distributed, managed training ingests data."
    ]
@@ -175,8 +167,8 @@
     "                             role=role,\n",
     "                             training_steps=10, \n",
     "                             evaluation_steps=10,\n",
-    "                             train_instance_count=1,\n",
-    "                             train_instance_type=instance_type)\n",
+    "                             train_instance_count=2,\n",
+    "                             train_instance_type='local')\n",
     "\n",
     "mnist_estimator.fit(inputs)"
    ]
@@ -208,7 +200,7 @@
    "outputs": [],
    "source": [
     "mnist_predictor = mnist_estimator.deploy(initial_instance_count=1,\n",
-    "                                             instance_type=instance_type)"
+    "                                         instance_type='local')"
    ]
   },
   {