diff --git a/doc/amazon_sagemaker_processing.rst b/doc/amazon_sagemaker_processing.rst index 3657c69d96..7fef331e0b 100644 --- a/doc/amazon_sagemaker_processing.rst +++ b/doc/amazon_sagemaker_processing.rst @@ -1,6 +1,6 @@ -############################## +########################### Amazon SageMaker Processing -############################## +########################### Amazon SageMaker Processing allows you to run steps for data pre- or post-processing, feature engineering, data validation, or model evaluation workloads on Amazon SageMaker. @@ -24,12 +24,10 @@ The fastest way to run get started with Amazon SageMaker Processing is by runnin You can run notebooks on Amazon SageMaker that demonstrate end-to-end examples of using processing jobs to perform data pre-processing, feature engineering and model evaluation steps. See `Learn More`_ at the bottom of this page for more in-depth information. -Data Pre-Processing and Model Evaluation with Scikit-Learn -================================================================== +Data Pre-Processing and Model Evaluation with scikit-learn +========================================================== -You can run a Scikit-Learn script to do data processing on SageMaker using the `SKLearnProcessor`_ class. - -.. _SKLearnProcessor: https://sagemaker.readthedocs.io/en/stable/sagemaker.sklearn.html#sagemaker.sklearn.processing.SKLearnProcessor +You can run a scikit-learn script to do data processing on SageMaker using the :class:`sagemaker.sklearn.processing.SKLearnProcessor` class. You first create a ``SKLearnProcessor`` @@ -37,41 +35,42 @@ You first create a ``SKLearnProcessor`` from sagemaker.sklearn.processing import SKLearnProcessor - sklearn_processor = SKLearnProcessor(framework_version='0.20.0', - role='[Your SageMaker-compatible IAM role]', - instance_type='ml.m5.xlarge', - instance_count=1) + sklearn_processor = SKLearnProcessor( + framework_version="0.20.0", + role="[Your SageMaker-compatible IAM role]", + instance_type="ml.m5.xlarge", + instance_count=1, + ) -Then you can run a Scikit-Learn script ``preprocessing.py`` in a processing job. In this example, our script takes one input from S3 and one command-line argument, processes the data, then splits the data into two datasets for output. When the job is finished, we can retrive the output from S3. +Then you can run a scikit-learn script ``preprocessing.py`` in a processing job. In this example, our script takes one input from S3 and one command-line argument, processes the data, then splits the data into two datasets for output. When the job is finished, we can retrive the output from S3. .. code:: python from sagemaker.processing import ProcessingInput, ProcessingOutput - sklearn_processor.run(code='preprocessing.py', - inputs=[ProcessingInput( - source='s3://your-bucket/path/to/your/data, - destination='/opt/ml/processing/input')], - outputs=[ProcessingOutput(output_name='train_data', - source='/opt/ml/processing/train'), - ProcessingOutput(output_name='test_data', - source='/opt/ml/processing/test')], - arguments=['--train-test-split-ratio', '0.2'] - ) + sklearn_processor.run( + code="preprocessing.py", + inputs=[ + ProcessingInput(source="s3://your-bucket/path/to/your/data", destination="/opt/ml/processing/input"), + ], + outputs=[ + ProcessingOutput(output_name="train_data", source="/opt/ml/processing/train"), + ProcessingOutput(output_name="test_data", source="/opt/ml/processing/test"), + ], + arguments=["--train-test-split-ratio", "0.2"], + ) preprocessing_job_description = sklearn_processor.jobs[-1].describe() -For an in-depth look, please see the `Scikit-Learn Data Processing and Model Evaluation`_ example notebook. +For an in-depth look, please see the `Scikit-learn Data Processing and Model Evaluation`_ example notebook. -.. _Scikit-Learn Data Processing and Model Evaluation: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker_processing/scikit_learn_data_processing_and_model_evaluation/scikit_learn_data_processing_and_model_evaluation.ipynb +.. _Scikit-learn Data Processing and Model Evaluation: https://github.com/awslabs/amazon-sagemaker-examples/blob/master/sagemaker_processing/scikit_learn_data_processing_and_model_evaluation/scikit_learn_data_processing_and_model_evaluation.ipynb Data Pre-Processing with Spark ============================== -You can use the `ScriptProcessor`_ class to run a script in a processing container, including your own container. - -.. _ScriptProcessor: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ScriptProcessor +You can use the :class:`sagemaker.processing.ScriptProcessor` class to run a script in a processing container, including your own container. This example shows how you can run a processing job inside of a container that can run a Spark script called ``preprocess.py`` by invoking a command ``/opt/program/submit`` inside the container. @@ -79,21 +78,31 @@ This example shows how you can run a processing job inside of a container that c from sagemaker.processing import ScriptProcessor, ProcessingInput - spark_processor = ScriptProcessor(base_job_name='spark-preprocessor', - image_uri='', - command=['/opt/program/submit'], - role=role, - instance_count=2, - instance_type='ml.r5.xlarge', - max_runtime_in_seconds=1200, - env={'mode': 'python'}) - - spark_processor.run(code='preprocess.py', - arguments=['s3_input_bucket', bucket, - 's3_input_key_prefix', input_prefix, - 's3_output_bucket', bucket, - 's3_output_key_prefix', input_preprocessed_prefix], - logs=False) + spark_processor = ScriptProcessor( + base_job_name="spark-preprocessor", + image_uri="", + command=["/opt/program/submit"], + role=role, + instance_count=2, + instance_type="ml.r5.xlarge", + max_runtime_in_seconds=1200, + env={"mode": "python"}, + ) + + spark_processor.run( + code="preprocess.py", + arguments=[ + "s3_input_bucket", + bucket, + "s3_input_key_prefix", + input_prefix, + "s3_output_bucket", + bucket, + "s3_output_key_prefix", + input_preprocessed_prefix, + ], + logs=False, + ) For an in-depth look, please see the `Feature Transformation with Spark`_ example notebook. @@ -106,19 +115,19 @@ Learn More Processing class documentation ------------------------------ -- ``Processor``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.Processor -- ``ScriptProcessor``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ScriptProcessor -- ``SKLearnProcessor``: https://sagemaker.readthedocs.io/en/stable/sagemaker.sklearn.html#sagemaker.sklearn.processing.SKLearnProcessor -- ``ProcessingInput``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ProcessingInput -- ``ProcessingOutput``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ProcessingOutput -- ``ProcessingJob``: https://sagemaker.readthedocs.io/en/stable/processing.html#sagemaker.processing.ProcessingJob +- :class:`sagemaker.processing.Processor` +- :class:`sagemaker.processing.ScriptProcessor` +- :class:`sagemaker.sklearn.processing.SKLearnProcessor` +- :class:`sagemaker.processing.ProcessingInput` +- :class:`sagemaker.processing.ProcessingOutput` +- :class:`sagemaker.processing.ProcessingJob` Further documentation --------------------- -- Processing class documentation: https://sagemaker.readthedocs.io/en/stable/processing.html -- ​​AWS Documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/processing-job.html -- AWS Notebook examples: https://github.com/awslabs/amazon-sagemaker-examples/tree/master/sagemaker_processing -- Processing API documentation: https://docs.aws.amazon.com/sagemaker/latest/dg/API_CreateProcessingJob.html -- Processing container specification: https://docs.aws.amazon.com/sagemaker/latest/dg/build-your-own-processing-container.html +- `Processing class documentation `_ +- `AWS Documentation `_ +- `AWS Notebook examples `_ +- `Processing API documentation `_ +- `Processing container specification `_ diff --git a/src/sagemaker/processing.py b/src/sagemaker/processing.py index 040bb51ebd..70590f0aaf 100644 --- a/src/sagemaker/processing.py +++ b/src/sagemaker/processing.py @@ -289,7 +289,8 @@ def __init__( network_config=None, ): """Initializes a ``ScriptProcessor`` instance. The ``ScriptProcessor`` - handles Amazon SageMaker Processing tasks for jobs using a machine learning framework. + handles Amazon SageMaker Processing tasks for jobs using a machine learning framework, + which allows for providing a script to be run as part of the Processing Job. Args: role (str): An AWS IAM role name or ARN. Amazon SageMaker Processing