|
10 | 10 | "\n",
|
11 | 11 | "By packaging an algorithm in a container, you can bring almost any code to the Amazon SageMaker environment, regardless of programming language, environment, framework, or dependencies. \n",
|
12 | 12 | "\n",
|
13 |
| - "1. [Building your own algorithm container](#Building-your-own-algorithm-container)\n", |
| 13 | + "1. [Building your own TensorFlow container](#Building-your-own-tensorflow-container)\n", |
14 | 14 | " 1. [When should I build my own algorithm container?](#When-should-I-build-my-own-algorithm-container?)\n",
|
15 | 15 | " 1. [Permissions](#Permissions)\n",
|
16 | 16 | " 1. [The example](#The-example)\n",
|
|
31 | 31 | " 1. [Create the session](#Create-the-session)\n",
|
32 | 32 | " 1. [Upload the data for training](#Upload-the-data-for-training)\n",
|
33 | 33 | " 1. [Training On SageMaker](#Training-on-SageMaker)\n",
|
34 |
| - " 1. [Making Predictions using Boto3](#Making-predictions-using-Boto3)\n", |
35 | 34 | " 1. [Optional cleanup](#Optional-cleanup) \n",
|
36 | 35 | "1. [Reference](#Reference)\n",
|
37 | 36 | "\n",
|
|
43 | 42 | "\n",
|
44 | 43 | "Even if there is direct SDK support for your environment or framework, you may find it more effective to build your own container. If the code that implements your algorithm is quite complex or you need special additions to the framework, building your own container may be the right choice.\n",
|
45 | 44 | "\n",
|
46 |
| - "This walkthrough shows that it is quite straightforward to build your own container. So don't worry if there isn't direct SDK support for your environment.\n", |
| 45 | + "Some of the reasons to build an already supported framework container are:\n", |
| 46 | + "1. A specific version isn't supported.\n", |
| 47 | + "2. Configure and install your dependencies and environment.\n", |
| 48 | + "3. Use a different training/hosting solution than provided.\n", |
| 49 | + "\n", |
| 50 | + "This walkthrough shows that it is quite straightforward to build your own container. So you can still use SageMaker even if your use case is not covered by the deep learning containers that we've built for you.\n", |
47 | 51 | "\n",
|
48 | 52 | "## Permissions\n",
|
49 | 53 | "\n",
|
50 | 54 | "Running this notebook requires permissions in addition to the normal `SageMakerFullAccess` permissions. This is because it creates new repositories in Amazon ECR. The easiest way to add these permissions is simply to add the managed policy `AmazonEC2ContainerRegistryFullAccess` to the role that you used to start your notebook instance. There's no need to restart your notebook instance when you do this, the new permissions will be available immediately.\n",
|
51 | 55 | "\n",
|
52 | 56 | "## The example\n",
|
53 | 57 | "\n",
|
54 |
| - "In this example we show how to package a custom TensorFlow container with a Python example which works with the CIFAR-10 dataset and uses TensorFlow Serving for inference. This example is kept simple as the main point is to show surrounding structure that you must add to your own code to train and host it in Amazon SageMaker.\n", |
55 |
| - "\n", |
56 |
| - "The approach demonstrated here works for any language or environment. You need to choose the right tools for your environment to serve HTTP requests for inference, but good HTTP environments are available in every language these days.\n", |
| 58 | + "In this example we show how to package a custom TensorFlow container with a Python example which works with the CIFAR-10 dataset and uses TensorFlow Serving for inference. However, different inference solutions other than TensorFlow Serving can be used by modifying the docker container.\n", |
57 | 59 | "\n",
|
58 | 60 | "In this example, we use a single image to support training and hosting. This simplifies the procedure because we only need to manage one image for both tasks. Sometimes you may want separate images for training and hosting because they have different requirements. In this case, separate the parts discussed below into separate Dockerfiles and build two images. Choosing whether to use a single image or two images is a matter of what is most convenient for you to develop and manage.\n",
|
59 | 61 | "\n",
|
|
128 | 130 | "\n",
|
129 | 131 | "##### The input\n",
|
130 | 132 | "\n",
|
131 |
| - "* `/opt/ml/input/config` contains information to control how your program runs. `hyperparameters.json` is a JSON-formatted dictionary of hyperparameter names to values. These values are always strings, so you may need to convert them. `resourceConfig.json` is a JSON-formatted file that describes the network layout used for distributed training. Since scikit-learn doesn't support distributed training, we ignore it here.\n", |
| 133 | + "* `/opt/ml/input/config` contains information to control how your program runs. `hyperparameters.json` is a JSON-formatted dictionary of hyperparameter names to values. These values are always strings, so you may need to convert them. `resourceConfig.json` is a JSON-formatted file that describes the network layout used for distributed training.\n", |
132 | 134 | "* `/opt/ml/input/data/<channel_name>/` (for File mode) contains the input data for that channel. The channels are created based on the call to CreateTrainingJob but it's generally important that channels match algorithm expectations. The files for each channel are copied from S3 to this directory, preserving the tree structure indicated by the S3 key structure. \n",
|
133 | 135 | "* `/opt/ml/input/data/<channel_name>_<epoch_number>` (for Pipe mode) is the pipe for a given epoch. Epochs start at zero and go up by one each time you read them. There is no limit to the number of epochs that you can run, but you must close each pipe before reading the next epoch.\n",
|
134 | 136 | "\n",
|
|
442 | 444 | "source": [
|
443 | 445 | "## Making predictions using Python SDK\n",
|
444 | 446 | "\n",
|
445 |
| - "To make predictions, we use an image that is converted using OpenCV into a json format to send as an inference request. We need to install OpenCV to deserialize the image that is used to make predictions." |
| 447 | + "To make predictions, we use an image that is converted using OpenCV into a json format to send as an inference request. We need to install OpenCV to deserialize the image that is used to make predictions.\n", |
| 448 | + "\n", |
| 449 | + "The JSON reponse will be the probabilities of the image belonging to one of the 10 classes along with the most likely class the picture belongs to. The classes can be referenced from the [CIFAR-10 website](https://www.cs.toronto.edu/~kriz/cifar.html). Since we didn't train the model for that long, we aren't expecting very accurate results." |
446 | 450 | ]
|
447 | 451 | },
|
448 | 452 | {
|
|
569 | 573 | "source": [
|
570 | 574 | "## Upload the data for training\n",
|
571 | 575 | "\n",
|
572 |
| - "When training models with huge amounts of data, you typically need big data tools, like Amazon Athena, AWS Glue, or Amazon EMR, to create your data in S3.\n", |
573 |
| - "\n", |
574 |
| - "We can use use the tools provided by the SageMaker Python SDK to upload the data to a default bucket." |
| 576 | + "We will use the tools provided by the SageMaker Python SDK to upload the data to a default bucket." |
575 | 577 | ]
|
576 | 578 | },
|
577 | 579 | {
|
|
665 | 667 | "predictor.predict(data)"
|
666 | 668 | ]
|
667 | 669 | },
|
668 |
| - { |
669 |
| - "cell_type": "markdown", |
670 |
| - "metadata": {}, |
671 |
| - "source": [ |
672 |
| - "## Making predictions using Boto3\n", |
673 |
| - "\n", |
674 |
| - "Below is an example of making an inference request through Boto3 on the same endpoint." |
675 |
| - ] |
676 |
| - }, |
677 |
| - { |
678 |
| - "cell_type": "code", |
679 |
| - "execution_count": null, |
680 |
| - "metadata": {}, |
681 |
| - "outputs": [], |
682 |
| - "source": [ |
683 |
| - "import json\n", |
684 |
| - "\n", |
685 |
| - "client = boto3.client('runtime.sagemaker')\n", |
686 |
| - "\n", |
687 |
| - "endpoint_name = predictor.endpoint\n", |
688 |
| - "\n", |
689 |
| - "response = client.invoke_endpoint(EndpointName=endpoint_name, Body=json.dumps(data))\n", |
690 |
| - "response_body = response['Body']\n", |
691 |
| - "\n", |
692 |
| - "print(response_body.read().decode('utf-8'))" |
693 |
| - ] |
694 |
| - }, |
695 | 670 | {
|
696 | 671 | "cell_type": "markdown",
|
697 | 672 | "metadata": {},
|
|
0 commit comments