aws · laurenyu · Jun 17, 2020 · Jun 16, 2020 · Jun 17, 2020 · Jun 17, 2020
@@ -1,6 +1,6 @@
-##############################
+###########################
 Amazon SageMaker Processing
-##############################
+###########################
 
 
 Amazon SageMaker Processing allows you to run steps for data pre- or post-processing, feature engineering, data validation, or model evaluation workloads on Amazon SageMaker.
@@ -24,76 +24,85 @@ The fastest way to run get started with Amazon SageMaker Processing is by runnin
 You can run notebooks on Amazon SageMaker that demonstrate end-to-end examples of using processing jobs to perform data pre-processing, feature engineering and model evaluation steps. See `Learn More`_ at the bottom of this page for more in-depth information.
 
 
-Data Pre-Processing and Model Evaluation with Scikit-Learn
-==================================================================
+Data Pre-Processing and Model Evaluation with scikit-learn
+==========================================================
 
-You can run a Scikit-Learn script to do data processing on SageMaker using the `SKLearnProcessor`_ class.
-
-.. _SKLearnProcessor: https://sagemaker.readthedocs.io/en/stable/sagemaker.sklearn.html#sagemaker.sklearn.processing.SKLearnProcessor
+You can run a scikit-learn script to do data processing on SageMaker using the :class:`sagemaker.sklearn.processing.SKLearnProcessor` class.
 
 You first create a ``SKLearnProcessor``
 
 .. code:: python
 
     from sagemaker.sklearn.processing import SKLearnProcessor
 
-    sklearn_processor = SKLearnProcessor(framework_version='0.20.0',
-                                     role='[Your SageMaker-compatible IAM role]',
-                                     instance_type='ml.m5.xlarge',
-                                     instance_count=1)
+    sklearn_processor = SKLearnProcessor(
+        framework_version="0.20.0",
+        role="[Your SageMaker-compatible IAM role]",
+        instance_type="ml.m5.xlarge",
+        instance_count=1,
+    )
 
-Then you can run a Scikit-Learn script ``preprocessing.py`` in a processing job. In this example, our script takes one input from S3 and one command-line argument, processes the data, then splits the data into two datasets for output. When the job is finished, we can retrive the output from S3.
+Then you can run a scikit-learn script ``preprocessing.py`` in a processing job. In this example, our script takes one input from S3 and one command-line argument, processes the data, then splits the data into two datasets for output. When the job is finished, we can retrive the output from S3.
 
 .. code:: python
 
     from sagemaker.processing import ProcessingInput, ProcessingOutput
 
-    sklearn_processor.run(code='preprocessing.py',
-                      inputs=[ProcessingInput(
-                        source='s3://your-bucket/path/to/your/data,
-                        destination='/opt/ml/processing/input')],
-                      outputs=[ProcessingOutput(output_name='train_data',
-                                                source='/opt/ml/processing/train'),
-                               ProcessingOutput(output_name='test_data',
-                                                source='/opt/ml/processing/test')],
-                      arguments=['--train-test-split-ratio', '0.2']
-                     )
+    sklearn_processor.run(
+        code="preprocessing.py",
+        inputs=[
+            ProcessingInput(source="s3://your-bucket/path/to/your/data", destination="/opt/ml/processing/input"),
+        ],
+        outputs=[
+            ProcessingOutput(output_name="train_data", source="/opt/ml/processing/train"),
+            ProcessingOutput(output_name="test_data", source="/opt/ml/processing/test"),
+        ],
+        arguments=["--train-test-split-ratio", "0.2"],
+    )
 
     preprocessing_job_description = sklearn_processor.jobs[-1].describe()
 
-For an in-depth look, please see the `Scikit-Learn Data Processing and Model Evaluation`_ example notebook.
+For an in-depth look, please see the `Scikit-learn Data Processing and Model Evaluation`_ example notebook.
 
-.. _Scikit-Learn Data Processing and Model Evaluation: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker_processing/scikit_learn_data_processing_and_model_evaluation/scikit_learn_data_processing_and_model_evaluation.ipynb
+.. _Scikit-learn Data Processing and Model Evaluation: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker_processing/scikit_learn_data_processing_and_model_evaluation/scikit_learn_data_processing_and_model_evaluation.ipynb
 
 
 Data Pre-Processing with Spark
 ==============================
 
-You can use the `ScriptProcessor`_ class to run a script in a processing container, including your own container.
-
-.. _ScriptProcessor: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ScriptProcessor
+You can use the :class:`sagemaker.processing.ScriptProcessor` class to run a script in a processing container, including your own container.
 
 This example shows how you can run a processing job inside of a container that can run a Spark script called ``preprocess.py`` by invoking a command ``/opt/program/submit`` inside the container.
 
 .. code:: python
 
     from sagemaker.processing import ScriptProcessor, ProcessingInput
 
-    spark_processor = ScriptProcessor(base_job_name='spark-preprocessor',
-                                  image_uri='<ECR repository URI to your Spark processing image>',
-                                  command=['/opt/program/submit'],
-                                  role=role,
-                                  instance_count=2,
-                                  instance_type='ml.r5.xlarge',
-                                  max_runtime_in_seconds=1200,
-                                  env={'mode': 'python'})
-
-    spark_processor.run(code='preprocess.py',
-                    arguments=['s3_input_bucket', bucket,
-                               's3_input_key_prefix', input_prefix,
-                               's3_output_bucket', bucket,
-                               's3_output_key_prefix', input_preprocessed_prefix],
-                    logs=False)
+    spark_processor = ScriptProcessor(
+        base_job_name="spark-preprocessor",
+        image_uri="<ECR repository URI to your Spark processing image>",
+        command=["/opt/program/submit"],
+        role=role,
+        instance_count=2,
+        instance_type="ml.r5.xlarge",
+        max_runtime_in_seconds=1200,
+        env={"mode": "python"},
+    )
+
+    spark_processor.run(
+        code="preprocess.py",
+        arguments=[
+            "s3_input_bucket",
+            bucket,
+            "s3_input_key_prefix",
+            input_prefix,
+            "s3_output_bucket",
+            bucket,
+            "s3_output_key_prefix",
+            input_preprocessed_prefix,
+        ],
+        logs=False,
+    )
 
 For an in-depth look, please see the `Feature Transformation with Spark`_ example notebook.
 
@@ -106,19 +115,19 @@ Learn More
 Processing class documentation
 ------------------------------
 
-- ``Processor``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.Processor
-- ``ScriptProcessor``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ScriptProcessor
-- ``SKLearnProcessor``: https://sagemaker.readthedocs.io/en/stable/sagemaker.sklearn.html#sagemaker.sklearn.processing.SKLearnProcessor
-- ``ProcessingInput``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ProcessingInput
-- ``ProcessingOutput``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ProcessingOutput
-- ``ProcessingJob``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ProcessingJob
+- :class:`sagemaker.processing.Processor`
+- :class:`sagemaker.processing.ScriptProcessor`
+- :class:`sagemaker.sklearn.processing.SKLearnProcessor`
+- :class:`sagemaker.processing.ProcessingInput`
+- :class:`sagemaker.processing.ProcessingOutput`
+- :class:`sagemaker.processing.ProcessingJob`
 
 
 Further documentation
 ---------------------
 
-- Processing class documentation: https://sagemaker.readthedocs.io/en/stable/processing.html
-- AWS Documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html
-- AWS Notebook examples: https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker_processing
-- Processing API documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateProcessingJob.html
-- Processing container specification: https://docs.aws.amazon.com/sagemaker/latest/dg/build-your-own-processing-container.html
+- `Processing class documentation <https://sagemaker.readthedocs.io/en/stable/processing.html>`_
+- `AWS Documentation <https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html>`_
+- `AWS Notebook examples <https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker_processing>`_
+- `Processing API documentation <https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateProcessingJob.html>`_
+- `Processing container specification <https://docs.aws.amazon.com/sagemaker/latest/dg/build-your-own-processing-container.html>`_
@@ -289,7 +289,8 @@ def __init__(
         network_config=None,
     ):
         """Initializes a ``ScriptProcessor`` instance. The ``ScriptProcessor``
-        handles Amazon SageMaker Processing tasks for jobs using a machine learning framework.
+        handles Amazon SageMaker Processing tasks for jobs using a machine learning framework,
+        which allows for providing a script to be run as part of the Processing Job.
 
         Args:
             role (str): An AWS IAM role name or ARN. Amazon SageMaker Processing