Skip to content

Add README for airflow #507

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Nov 20, 2018
Merged

Add README for airflow #507

merged 3 commits into from
Nov 20, 2018

Conversation

yangaws
Copy link
Contributor

@yangaws yangaws commented Nov 20, 2018

Issue #, if available:

Description of changes:
Add README for using SageMaker with Airflow.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have updated the changelog with a description of my changes (if appropriate)
  • I have updated any necessary documentation (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov-io
Copy link

codecov-io commented Nov 20, 2018

Codecov Report

Merging #507 into master will increase coverage by 0.12%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #507      +/-   ##
==========================================
+ Coverage   94.13%   94.26%   +0.12%     
==========================================
  Files          59       59              
  Lines        4621     4621              
==========================================
+ Hits         4350     4356       +6     
+ Misses        271      265       -6
Impacted Files Coverage Δ
src/sagemaker/local/image.py 89.6% <0%> (+1.83%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9c27fd9...338e00b. Read the comment docs.

@yangaws yangaws removed the request for review from laurenyu November 20, 2018 09:01
@@ -0,0 +1,162 @@
=============================
SageMaker Workflow in Airflow

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to change the master readme and add a workflow section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! One section added to master readme and linked to this one.

yuanzhua
yuanzhua previously approved these changes Nov 20, 2018
you can build a workflow for SageMaker training, hyperparameter tuning, batch transform and endpoint deployment.
You can use any SageMaker deep learning framework or Amazon algorithms to perform above operations in Airflow.

There are two ways to build SageMaker workflow. Using Airflow SageMaker operators or using Airflow PythonOperator.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...to build a SageMaker workflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


There are two ways to build SageMaker workflow. Using Airflow SageMaker operators or using Airflow PythonOperator.

1. SageMaker Operators: Since Airflow 1.10.1, we contributed special operators just for SageMaker operations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Airflow 1.10.1, the SageMaker team contributed special operators for SageMaker operations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

There are two ways to build SageMaker workflow. Using Airflow SageMaker operators or using Airflow PythonOperator.

1. SageMaker Operators: Since Airflow 1.10.1, we contributed special operators just for SageMaker operations.
Each operator takes a configuration dictionary that defines the corresponding operation. And we provide APIs to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We provide APIs to generate the configuration dictionary in the SageMaker Python SDK. Currently, the following SageMaker operators are supported:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

* ``SageMakerEndpointConfigOperator``
* ``SageMakerEndpointOperator``

2. PythonOperator: Airflow built-in operator that could execute Python callables. You could use SageMaker Python SDK to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Airflow built-in operator that executes Python callables. You can use the PythonOperator to execute operations in the SageMaker Python SDK to creat a SageMaker workflow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Using Airflow on AWS
~~~~~~~~~~~~~~~~~~~~

Turbine is an open source AWS CloudFormation template to create Airflow resources stack on AWS.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turbine is an open-source AWS CloudFormation template that enables you to create an Airflow resource stack on AWS.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

data=your_transform_data_s3_uri,
content_type='text/csv')

Now we can pass these configurations to related SageMaker operators and create the workflow:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now you can pass these configurations to the corresponding SageMaker operators and create the workflow:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

`Airflow PythonOperator <https://airflow.apache.org/howto/operator.html?#pythonoperator>`_
is a built-in operator that can execute any Python callables. If you want to build the SageMaker workflow in a more
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...execute any Python callable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


`Airflow PythonOperator <https://airflow.apache.org/howto/operator.html?#pythonoperator>`_
is a built-in operator that can execute any Python callables. If you want to build the SageMaker workflow in a more
flexible way, you could write your python callables for SageMaker operations using SageMaker Python SDK. For example:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...flexible way, writer your python callables for SageMaker operatoins by using the SageMaker Python SDK.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

transformer = estimator.transformer(instance_count=1, instance_type='ml.c4.xlarge')
transformer.transform(data, content_type='text/csv')

Then you could build your workflow using PythonOperator with Python callables defined above:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then build your workflow by using the PythonOperator with the Python callables defined above:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


transform_op.set_upstream(train_op)

A workflow with SageMaker training and batch transform is finished! In this way, you could customize your Python
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A workflow that runs a SageMaker training job and a batch transform job is finished. You can customize your Python callables with the SageMaker Python SDK according to your needs, and build more flexible and powerful workflows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

@yangaws yangaws merged commit 0071ff8 into aws:master Nov 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants