Skip to content

Commit 151de9a

Browse files
authored
[doc] Added Amazon Components for Kubeflow Pipelines (aws#1533)
* Create amazon_sagemaker_components_for_kubeflow_pipelines.rst * Create using_amazon_sagemaker_components.rst * Update index.rst * fixed formatting issues * fixed formatting issues
1 parent c452ca9 commit 151de9a

File tree

3 files changed

+1042
-0
lines changed

3 files changed

+1042
-0
lines changed
Lines changed: 205 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,205 @@
1+
Amazon SageMaker Components for Kubeflow Pipelines
2+
==================================================
3+
4+
This document outlines how to use Amazon SageMaker Components
5+
for Kubeflow Pipelines (KFP). With these pipeline components, you can
6+
create and monitor training, tuning, endpoint deployment, and batch
7+
transform jobs in Amazon SageMaker. By running Kubeflow Pipeline jobs on
8+
Amazon SageMaker, you move data processing and training jobs from the
9+
Kubernetes cluster to Amazon SageMaker’s machine learning-optimized
10+
managed service. This document assumes prior knowledge of Kubernetes and
11+
Kubeflow.
12+
13+
What is Kubeflow Pipelines?
14+
---------------------------
15+
16+
Kubeflow Pipelines (KFP) is a platform for building and deploying
17+
portable, scalable machine learning (ML) workflows based on Docker
18+
containers. The Kubeflow Pipelines platform consists of the following:
19+
20+
- A user interface (UI) for managing and tracking experiments, jobs,
21+
and runs.
22+
23+
- An engine (Argo) for scheduling multi-step ML workflows.
24+
25+
- A Python SDK for defining and manipulating pipelines and components.
26+
27+
- Notebooks for interacting with the system using the SDK.
28+
29+
A pipeline is a description of an ML workflow expressed as a directed
30+
acyclic \ `graph <https://www.kubeflow.org/docs/pipelines/concepts/graph/>`__
31+
as shown in the following diagram.  Every step in the workflow is
32+
expressed as a Kubeflow Pipeline
33+
`component <https://www.kubeflow.org/docs/pipelines/overview/concepts/component/>`__,
34+
which is a Python module.
35+
36+
If your data has been preprocessed, the standard pipeline takes a subset
37+
of the data and runs hyperparameter optimization of the model. The
38+
pipeline then trains a model with the full dataset using the optimal
39+
hyperparameters. This model is used for both batch inference and
40+
endpoint creation.
41+
42+
For more information on Kubeflow Pipelines, see the \ `Kubeflow
43+
Pipelines documentation <https://www.kubeflow.org/docs/pipelines/>`__.
44+
45+
Kubeflow Pipeline components
46+
----------------------------
47+
48+
A Kubeflow Pipeline component is a set of code used to execute one step
49+
in a Kubeflow pipeline. Components are represented by a Python module
50+
that is converted into a Docker image. These components make it fast and
51+
easy to write pipelines for experimentation and production environments
52+
without having to interact with the underlying Kubernetes
53+
infrastructure.
54+
55+
What do Amazon SageMaker Components for Kubeflow Pipelines provide?
56+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
57+
Amazon SageMaker Components for Kubeflow Pipelines offer an alternative
58+
to launching compute-intensive jobs in Amazon SageMaker. These
59+
components integrate Amazon SageMaker with the portability and
60+
orchestration of Kubeflow Pipelines. Using the Amazon SageMaker
61+
components, each of the jobs in the pipeline workflow runs on Amazon
62+
SageMaker instead of the local Kubernetes cluster. The job parameters,
63+
status, logs, and outputs from Amazon SageMaker are still accessible
64+
from the Kubeflow Pipelines UI. The following Amazon SageMaker
65+
components have been created to integrate 6 key Amazon SageMaker
66+
features into your ML workflows. You can create a Kubeflow Pipeline
67+
built entirely using these components, or integrate individual
68+
components into your workflow as needed.
69+
70+
There is no additional charge for using Amazon SageMaker Components for
71+
Kubeflow Pipelines. You incur charges for any Amazon SageMaker resources
72+
you use through these components.
73+
74+
Training components
75+
^^^^^^^^^^^^^^^^^^^
76+
77+
**Training**
78+
79+
The Training component allows you to submit Amazon SageMaker Training
80+
jobs directly from a Kubeflow Pipelines workflow. For more information,
81+
see \ `SageMaker Training Kubeflow Pipelines
82+
component <https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/train>`__.
83+
84+
**Hyperparameter Optimization**
85+
86+
The Hyperparameter Optimization component enables you to submit
87+
hyperparameter tuning jobs to Amazon SageMaker directly from a Kubeflow
88+
Pipelines workflow. For more information, see \ `SageMaker
89+
hyperparameter optimization Kubeflow Pipeline
90+
component <https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/hyperparameter_tuning>`__.
91+
92+
Inference components
93+
^^^^^^^^^^^^^^^^^^^^
94+
95+
**Hosting Deploy**
96+
97+
The Deploy component enables you to deploy a model in Amazon SageMaker
98+
Hosting from a Kubeflow Pipelines workflow. For more information,
99+
see \ `SageMaker Hosting Services - Create Endpoint Kubeflow Pipeline
100+
component <https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/deploy>`__.
101+
102+
**Batch Transform component**
103+
104+
The Batch Transform component enables you to run inference jobs for an
105+
entire dataset in Amazon SageMaker from a Kubeflow Pipelines workflow.
106+
For more information, see \ `SageMaker Batch Transform Kubeflow Pipeline
107+
component <https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/batch_transform>`__.
108+
109+
Ground Truth components
110+
^^^^^^^^^^^^^^^^^^^^^^^
111+
112+
**Ground Truth**\
113+
114+
The Ground Truth component enables you to to submit Amazon SageMaker
115+
Ground Truth labeling jobs directly from a Kubeflow Pipelines workflow.
116+
For more information, see \ `SageMaker Ground Truth Kubeflow Pipelines
117+
component <https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/ground_truth>`__.
118+
119+
**Workteam**
120+
121+
The Workteam component enables you to create Amazon SageMaker private
122+
workteam jobs directly from a Kubeflow Pipelines workflow. For more
123+
information, see \ `SageMaker create private workteam Kubeflow Pipelines
124+
component <https://github.com/kubeflow/pipelines/tree/master/components/aws/sagemaker/workteam>`__.
125+
126+
IAM permissions
127+
---------------
128+
129+
Deploying Kubeflow Pipelines with Amazon SageMaker components requires
130+
the following three levels of IAM permissions:
131+
132+
- An IAM user/role to access your AWS account (**your\_credentials**).
133+
Note: You don’t need this at all if you already have access to KFP
134+
web UI and have your input data in Amazon S3, or if you already have
135+
an Amazon Elastic Kubernetes Service (Amazon EKS) cluster with KFP.
136+
137+
You use this user/role from your gateway node, which can be your
138+
local machine or a remote instance, to:
139+
140+
- Create an Amazon EKS cluster and install KFP
141+
142+
- Create IAM roles/users
143+
144+
- Create S3 buckets for your sample input data
145+
146+
The IAM user/role needs the following permissions:
147+
148+
- CloudWatchLogsFullAccess
149+
150+
- `AWSCloudFormationFullAccess <https://console.aws.amazon.com/iam/home?region=us-east-1#/policies/arn%3Aaws%3Aiam%3A%3Aaws%3Apolicy%2FAWSCloudFormationFullAccess>`__
151+
152+
- IAMFullAccess
153+
154+
- AmazonS3FullAccess
155+
156+
- AmazonEC2FullAccess
157+
158+
- AmazonEKSAdminPolicy - Create this policy using the schema
159+
from \ `Amazon EKS Identity-Based Policy
160+
Examples <https://docs.aws.amazon.com/eks/latest/userguide/security_iam_id-based-policy-examples.html>`__
161+
162+
- An IAM role used by KFP pods to access Amazon Sagemaker
163+
(**kfp-example-pod-role**) The KFP pods use this permission to create
164+
Amazon SageMaker jobs from KFP components. Note: If you want to limit
165+
permissions to the KFP pods, create your own custom policy and attach
166+
it.
167+
168+
The role needs the following permission:
169+
170+
- AmazonSageMakerFullAccess
171+
172+
- An IAM role used by SageMaker jobs to access resources such as Amazon
173+
S3, ECR etc. (**kfp-example-sagemaker-execution-role**).
174+
175+
Your Amazon SageMaker jobs use this role to:
176+
177+
- Access Amazon Sagemaker resources
178+
179+
- Input Data from S3
180+
181+
- Store your output model to S3
182+
183+
The role needs the following permissions:
184+
185+
- AmazonSageMakerFullAccess
186+
187+
- AmazonS3FullAccess
188+
189+
These are all the IAM users/roles you need to run KFP components for
190+
Amazon SageMaker.
191+
192+
When you have run the components and have created the Amazon SageMaker
193+
endpoint, you also need a role with the ``sagemaker:InvokeEndpoint``
194+
permission to query inference endpoints.
195+
196+
Converting Pipelines to use Amazon SageMaker
197+
--------------------------------------------
198+
199+
You can convert an existing pipeline to use Amazon SageMaker by porting
200+
your generic Python `processing
201+
containers <https://docs.aws.amazon.com/sagemaker/latest/dg/amazon-sagemaker-containers.html>`__
202+
and \ `training
203+
containers <https://docs.aws.amazon.com/sagemaker/latest/dg/your-algorithms-training-algo.html>`__.
204+
If you are using Amazon SageMaker for inference, you also need to attach
205+
IAM permissions to your cluster and convert an artifact to a model.

doc/kubernetes/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,3 +9,5 @@ Orchestrate your SageMaker training and inference jobs with Kubernetes.
99

1010
amazon_sagemaker_operators_for_kubernetes
1111
amazon_sagemaker_jobs
12+
amazon_sagemaker_components_for_kubeflow_pipelines
13+
using_amazon_sagemaker_components

0 commit comments

Comments
 (0)