-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add README for airflow #507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #507 +/- ##
==========================================
+ Coverage 94.13% 94.26% +0.12%
==========================================
Files 59 59
Lines 4621 4621
==========================================
+ Hits 4350 4356 +6
+ Misses 271 265 -6
Continue to review full report at Codecov.
|
src/sagemaker/workflow/README.rst
Outdated
@@ -0,0 +1,162 @@ | |||
============================= | |||
SageMaker Workflow in Airflow |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to change the master readme and add a workflow section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! One section added to master readme and linked to this one.
src/sagemaker/workflow/README.rst
Outdated
you can build a workflow for SageMaker training, hyperparameter tuning, batch transform and endpoint deployment. | ||
You can use any SageMaker deep learning framework or Amazon algorithms to perform above operations in Airflow. | ||
|
||
There are two ways to build SageMaker workflow. Using Airflow SageMaker operators or using Airflow PythonOperator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...to build a SageMaker workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
src/sagemaker/workflow/README.rst
Outdated
|
||
There are two ways to build SageMaker workflow. Using Airflow SageMaker operators or using Airflow PythonOperator. | ||
|
||
1. SageMaker Operators: Since Airflow 1.10.1, we contributed special operators just for SageMaker operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Airflow 1.10.1, the SageMaker team contributed special operators for SageMaker operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
src/sagemaker/workflow/README.rst
Outdated
There are two ways to build SageMaker workflow. Using Airflow SageMaker operators or using Airflow PythonOperator. | ||
|
||
1. SageMaker Operators: Since Airflow 1.10.1, we contributed special operators just for SageMaker operations. | ||
Each operator takes a configuration dictionary that defines the corresponding operation. And we provide APIs to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We provide APIs to generate the configuration dictionary in the SageMaker Python SDK. Currently, the following SageMaker operators are supported:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
src/sagemaker/workflow/README.rst
Outdated
* ``SageMakerEndpointConfigOperator`` | ||
* ``SageMakerEndpointOperator`` | ||
|
||
2. PythonOperator: Airflow built-in operator that could execute Python callables. You could use SageMaker Python SDK to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Airflow built-in operator that executes Python callables. You can use the PythonOperator to execute operations in the SageMaker Python SDK to creat a SageMaker workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
src/sagemaker/workflow/README.rst
Outdated
Using Airflow on AWS | ||
~~~~~~~~~~~~~~~~~~~~ | ||
|
||
Turbine is an open source AWS CloudFormation template to create Airflow resources stack on AWS. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turbine is an open-source AWS CloudFormation template that enables you to create an Airflow resource stack on AWS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
src/sagemaker/workflow/README.rst
Outdated
data=your_transform_data_s3_uri, | ||
content_type='text/csv') | ||
|
||
Now we can pass these configurations to related SageMaker operators and create the workflow: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now you can pass these configurations to the corresponding SageMaker operators and create the workflow:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
src/sagemaker/workflow/README.rst
Outdated
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
`Airflow PythonOperator <https://airflow.apache.org/howto/operator.html?#pythonoperator>`_ | ||
is a built-in operator that can execute any Python callables. If you want to build the SageMaker workflow in a more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...execute any Python callable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
src/sagemaker/workflow/README.rst
Outdated
|
||
`Airflow PythonOperator <https://airflow.apache.org/howto/operator.html?#pythonoperator>`_ | ||
is a built-in operator that can execute any Python callables. If you want to build the SageMaker workflow in a more | ||
flexible way, you could write your python callables for SageMaker operations using SageMaker Python SDK. For example: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...flexible way, writer your python callables for SageMaker operatoins by using the SageMaker Python SDK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
src/sagemaker/workflow/README.rst
Outdated
transformer = estimator.transformer(instance_count=1, instance_type='ml.c4.xlarge') | ||
transformer.transform(data, content_type='text/csv') | ||
|
||
Then you could build your workflow using PythonOperator with Python callables defined above: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then build your workflow by using the PythonOperator with the Python callables defined above:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
src/sagemaker/workflow/README.rst
Outdated
|
||
transform_op.set_upstream(train_op) | ||
|
||
A workflow with SageMaker training and batch transform is finished! In this way, you could customize your Python |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A workflow that runs a SageMaker training job and a batch transform job is finished. You can customize your Python callables with the SageMaker Python SDK according to your needs, and build more flexible and powerful workflows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated.
Issue #, if available:
Description of changes:
Add README for using SageMaker with Airflow.
Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.