Skip to content

Commit 4d948ac

Browse files
authored
doc: add some clarification to Processing docs (aws#1600)
1 parent 25dc97e commit 4d948ac

File tree

2 files changed

+64
-54
lines changed

2 files changed

+64
-54
lines changed

doc/amazon_sagemaker_processing.rst

Lines changed: 62 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
##############################
1+
###########################
22
Amazon SageMaker Processing
3-
##############################
3+
###########################
44

55

66
Amazon SageMaker Processing allows you to run steps for data pre- or post-processing, feature engineering, data validation, or model evaluation workloads on Amazon SageMaker.
@@ -24,76 +24,85 @@ The fastest way to run get started with Amazon SageMaker Processing is by runnin
2424
You can run notebooks on Amazon SageMaker that demonstrate end-to-end examples of using processing jobs to perform data pre-processing, feature engineering and model evaluation steps. See `Learn More`_ at the bottom of this page for more in-depth information.
2525

2626

27-
Data Pre-Processing and Model Evaluation with Scikit-Learn
28-
==================================================================
27+
Data Pre-Processing and Model Evaluation with scikit-learn
28+
==========================================================
2929

30-
You can run a Scikit-Learn script to do data processing on SageMaker using the `SKLearnProcessor`_ class.
31-
32-
.. _SKLearnProcessor: https://sagemaker.readthedocs.io/en/stable/sagemaker.sklearn.html#sagemaker.sklearn.processing.SKLearnProcessor
30+
You can run a scikit-learn script to do data processing on SageMaker using the :class:`sagemaker.sklearn.processing.SKLearnProcessor` class.
3331

3432
You first create a ``SKLearnProcessor``
3533

3634
.. code:: python
3735
3836
from sagemaker.sklearn.processing import SKLearnProcessor
3937
40-
sklearn_processor = SKLearnProcessor(framework_version='0.20.0',
41-
role='[Your SageMaker-compatible IAM role]',
42-
instance_type='ml.m5.xlarge',
43-
instance_count=1)
38+
sklearn_processor = SKLearnProcessor(
39+
framework_version="0.20.0",
40+
role="[Your SageMaker-compatible IAM role]",
41+
instance_type="ml.m5.xlarge",
42+
instance_count=1,
43+
)
4444
45-
Then you can run a Scikit-Learn script ``preprocessing.py`` in a processing job. In this example, our script takes one input from S3 and one command-line argument, processes the data, then splits the data into two datasets for output. When the job is finished, we can retrive the output from S3.
45+
Then you can run a scikit-learn script ``preprocessing.py`` in a processing job. In this example, our script takes one input from S3 and one command-line argument, processes the data, then splits the data into two datasets for output. When the job is finished, we can retrive the output from S3.
4646

4747
.. code:: python
4848
4949
from sagemaker.processing import ProcessingInput, ProcessingOutput
5050
51-
sklearn_processor.run(code='preprocessing.py',
52-
inputs=[ProcessingInput(
53-
source='s3://your-bucket/path/to/your/data,
54-
destination='/opt/ml/processing/input')],
55-
outputs=[ProcessingOutput(output_name='train_data',
56-
source='/opt/ml/processing/train'),
57-
ProcessingOutput(output_name='test_data',
58-
source='/opt/ml/processing/test')],
59-
arguments=['--train-test-split-ratio', '0.2']
60-
)
51+
sklearn_processor.run(
52+
code="preprocessing.py",
53+
inputs=[
54+
ProcessingInput(source="s3://your-bucket/path/to/your/data", destination="/opt/ml/processing/input"),
55+
],
56+
outputs=[
57+
ProcessingOutput(output_name="train_data", source="/opt/ml/processing/train"),
58+
ProcessingOutput(output_name="test_data", source="/opt/ml/processing/test"),
59+
],
60+
arguments=["--train-test-split-ratio", "0.2"],
61+
)
6162
6263
preprocessing_job_description = sklearn_processor.jobs[-1].describe()
6364
64-
For an in-depth look, please see the `Scikit-Learn Data Processing and Model Evaluation`_ example notebook.
65+
For an in-depth look, please see the `Scikit-learn Data Processing and Model Evaluation`_ example notebook.
6566

66-
.. _Scikit-Learn Data Processing and Model Evaluation: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker_processing/scikit_learn_data_processing_and_model_evaluation/scikit_learn_data_processing_and_model_evaluation.ipynb
67+
.. _Scikit-learn Data Processing and Model Evaluation: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker_processing/scikit_learn_data_processing_and_model_evaluation/scikit_learn_data_processing_and_model_evaluation.ipynb
6768

6869

6970
Data Pre-Processing with Spark
7071
==============================
7172

72-
You can use the `ScriptProcessor`_ class to run a script in a processing container, including your own container.
73-
74-
.. _ScriptProcessor: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ScriptProcessor
73+
You can use the :class:`sagemaker.processing.ScriptProcessor` class to run a script in a processing container, including your own container.
7574

7675
This example shows how you can run a processing job inside of a container that can run a Spark script called ``preprocess.py`` by invoking a command ``/opt/program/submit`` inside the container.
7776

7877
.. code:: python
7978
8079
from sagemaker.processing import ScriptProcessor, ProcessingInput
8180
82-
spark_processor = ScriptProcessor(base_job_name='spark-preprocessor',
83-
image_uri='<ECR repository URI to your Spark processing image>',
84-
command=['/opt/program/submit'],
85-
role=role,
86-
instance_count=2,
87-
instance_type='ml.r5.xlarge',
88-
max_runtime_in_seconds=1200,
89-
env={'mode': 'python'})
90-
91-
spark_processor.run(code='preprocess.py',
92-
arguments=['s3_input_bucket', bucket,
93-
's3_input_key_prefix', input_prefix,
94-
's3_output_bucket', bucket,
95-
's3_output_key_prefix', input_preprocessed_prefix],
96-
logs=False)
81+
spark_processor = ScriptProcessor(
82+
base_job_name="spark-preprocessor",
83+
image_uri="<ECR repository URI to your Spark processing image>",
84+
command=["/opt/program/submit"],
85+
role=role,
86+
instance_count=2,
87+
instance_type="ml.r5.xlarge",
88+
max_runtime_in_seconds=1200,
89+
env={"mode": "python"},
90+
)
91+
92+
spark_processor.run(
93+
code="preprocess.py",
94+
arguments=[
95+
"s3_input_bucket",
96+
bucket,
97+
"s3_input_key_prefix",
98+
input_prefix,
99+
"s3_output_bucket",
100+
bucket,
101+
"s3_output_key_prefix",
102+
input_preprocessed_prefix,
103+
],
104+
logs=False,
105+
)
97106
98107
For an in-depth look, please see the `Feature Transformation with Spark`_ example notebook.
99108

@@ -106,19 +115,19 @@ Learn More
106115
Processing class documentation
107116
------------------------------
108117

109-
- ``Processor``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.Processor
110-
- ``ScriptProcessor``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ScriptProcessor
111-
- ``SKLearnProcessor``: https://sagemaker.readthedocs.io/en/stable/sagemaker.sklearn.html#sagemaker.sklearn.processing.SKLearnProcessor
112-
- ``ProcessingInput``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ProcessingInput
113-
- ``ProcessingOutput``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ProcessingOutput
114-
- ``ProcessingJob``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ProcessingJob
118+
- :class:`sagemaker.processing.Processor`
119+
- :class:`sagemaker.processing.ScriptProcessor`
120+
- :class:`sagemaker.sklearn.processing.SKLearnProcessor`
121+
- :class:`sagemaker.processing.ProcessingInput`
122+
- :class:`sagemaker.processing.ProcessingOutput`
123+
- :class:`sagemaker.processing.ProcessingJob`
115124

116125

117126
Further documentation
118127
---------------------
119128

120-
- Processing class documentation: https://sagemaker.readthedocs.io/en/stable/processing.html
121-
- ​​AWS Documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html
122-
- AWS Notebook examples: https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker_processing
123-
- Processing API documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateProcessingJob.html
124-
- Processing container specification: https://docs.aws.amazon.com/sagemaker/latest/dg/build-your-own-processing-container.html
129+
- `Processing class documentation <https://sagemaker.readthedocs.io/en/stable/processing.html>`_
130+
- `AWS Documentation <https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html>`_
131+
- `AWS Notebook examples <https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker_processing>`_
132+
- `Processing API documentation <https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateProcessingJob.html>`_
133+
- `Processing container specification <https://docs.aws.amazon.com/sagemaker/latest/dg/build-your-own-processing-container.html>`_

src/sagemaker/processing.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -289,7 +289,8 @@ def __init__(
289289
network_config=None,
290290
):
291291
"""Initializes a ``ScriptProcessor`` instance. The ``ScriptProcessor``
292-
handles Amazon SageMaker Processing tasks for jobs using a machine learning framework.
292+
handles Amazon SageMaker Processing tasks for jobs using a machine learning framework,
293+
which allows for providing a script to be run as part of the Processing Job.
293294
294295
Args:
295296
role (str): An AWS IAM role name or ARN. Amazon SageMaker Processing

0 commit comments

Comments
 (0)