Skip to content

Commit bebe700

Browse files
committed
add the doc for the MonitorBatchTransformStep
1 parent 555e0b7 commit bebe700

File tree

2 files changed

+125
-0
lines changed

2 files changed

+125
-0
lines changed

doc/amazon_sagemaker_model_building_pipeline.rst

+123
Original file line numberDiff line numberDiff line change
@@ -954,6 +954,129 @@ When model repacking is needed, :class:`sagemaker.workflow.model_step.ModelStep`
954954
955955
:class:`sagemaker.workflow.model_step.ModelStep` uses the provided inputs to automatically detect if a repack is needed. If a repack is needed, :class:`sagemaker.workflow.steps.TrainingStep` is added to the step collection for that repack. Then, either :class:`sagemaker.workflow.steps.CreateModelStep` or :class:`sagemaker.workflow.step_collections.RegisterModelStep` will be chained after it.
956956
957+
MonitorBatchTransform Step
958+
===========================
959+
960+
MonitorBatchTransformStep is a new step type that allows customers to use SageMaker Model Monitor with batch transform jobs that are a part of their pipeline. Using this step, customers can set up the following monitors for their batch transform job: data quality, model quality, model bias, and feature attribution.
961+
962+
963+
When configuring this step, customers have the flexibility to run the monitoring job before or after the transform job executes. There is an additional flag called :code:`fail_on_violation` which will fail the step if set to true and there is a monitoring violation, or will continue to execute the step if set to false.
964+
965+
Here is an example showing you how to configure a :class:`sagemaker.workflow.monitor_batch_transform_step.MonitorBatchTransformStep` with a Data Quality monitor.
966+
967+
.. code-block:: python
968+
969+
from sagemaker.workflow.pipeline_context import PipelineSession
970+
971+
from sagemaker.transformer import Transformer
972+
from sagemaker.model_monitor import DefaultModelMonitor
973+
from sagemaker.model_monitor.dataset_format import DatasetFormat
974+
from sagemaker.workflow.check_job_config import CheckJobConfig
975+
from sagemaker.workflow.quality_check_step import DataQualityCheckConfig
976+
977+
from sagemaker.workflow.parameters import ParameterString
978+
979+
pipeline_session = PipelineSession()
980+
981+
transform_input_param = ParameterString(
982+
name="transform_input",
983+
default_value=f"s3://{bucket}/{prefix}/my-transform-input",
984+
)
985+
986+
# the resource configuration for the monitoring job
987+
job_config = CheckJobConfig(
988+
role=role,
989+
instance_count=1,
990+
instance_type="ml.m5.xlarge",
991+
...
992+
)
993+
994+
The following code sample demonstrates how to set up an on-demand batch transform *data quality* monitor:
995+
996+
.. code-block:: python
997+
998+
# configure your transformer
999+
transformer = Transformer(..., sagemaker_session=pipeline_session)
1000+
transform_arg = transformer.transform(
1001+
transform_input_param,
1002+
content_type="text/csv",
1003+
split_type="Line",
1004+
...
1005+
)
1006+
1007+
data_quality_config = DataQualityCheckConfig(
1008+
baseline_dataset=transform_input_param,
1009+
dataset_format=DatasetFormat.csv(header=False),
1010+
output_s3_uri="s3://my-report-path",
1011+
)
1012+
1013+
from sagemaker.workflow.monitor_batch_transform_step import MonitorBatchTransformStep
1014+
1015+
transform_and_monitor_step = MonitorBatchTransformStep(
1016+
name="MyMonitorBatchTransformStep",
1017+
transform_step_args=transform_arg,
1018+
monitor_configuration=data_quality_config,
1019+
check_job_configuration=job_config,
1020+
# no need to wait for the transform output.
1021+
monitor_before_transform=True,
1022+
# if violation is detected in the monitoring, you can skip it
1023+
# and continue running batch transform
1024+
fail_on_violation=False,
1025+
supplied_baseline_statistics="s3://my-baseline-statistics.json",
1026+
supplied_baseline_constraints="s3://my-baseline-constraints.json",
1027+
)
1028+
...
1029+
1030+
The same example can be extended for model quality, bias, and feature attribute monitoring.
1031+
1032+
.. warning::
1033+
Note that to run on-demand model quality, you will need to have the ground truth data ready. When running the transform job, include the ground truth inside your transform input, and join the transform inference input and output. Then you can indicate which attribute or column name/index points to the ground truth when run the monitoring job.
1034+
1035+
.. code-block:: python
1036+
1037+
transformer = Transformer(..., sagemaker_session=pipeline_session)
1038+
1039+
transform_arg = transformer.transform(
1040+
transform_input_param,
1041+
content_type="text/csv",
1042+
split_type="Line",
1043+
# Note that we need to join both the inference input and output
1044+
# into transform outputs. The inference input needs to have the ground truth.
1045+
# details can be found here
1046+
# https://docs.aws.amazon.com/sagemaker/latest/dg/batch-transform-data-processing.html
1047+
join_source="Input",
1048+
# We need to exclude the ground truth inside the inference input
1049+
# before passing it to the prediction model.
1050+
# Assume the first column of our csv file is the ground truth
1051+
input_filter="$[1:]",
1052+
...
1053+
)
1054+
1055+
model_quality_config = ModelQualityCheckConfig(
1056+
baseline_dataset=transformer.output_path,
1057+
problem_type="BinaryClassification",
1058+
dataset_format=DatasetFormat.csv(header=False),
1059+
output_s3_uri="s3://my-output",
1060+
# assume the model output is at column idx 10
1061+
inference_attribute="_c10",
1062+
# remember the first column is the ground truth.
1063+
ground_truth_attribute="_c0",
1064+
)
1065+
from sagemaker.workflow.monitor_batch_transform_step import MonitorBatchTransformStep
1066+
1067+
transform_and_monitor_step = MonitorBatchTransformStep(
1068+
name="MyMonitorBatchTransformStep",
1069+
transform_step_args=transform_arg,
1070+
monitor_configuration=data_quality_config,
1071+
check_job_configuration=job_config,
1072+
# in fact, monitor_before_transform can not be true for model quality
1073+
monitor_before_transform=False,
1074+
fail_on_violation=True,
1075+
supplied_baseline_statistics="s3://my-baseline-statistics.json",
1076+
supplied_baseline_constraints="s3://my-baseline-constraints.json",
1077+
)
1078+
...
1079+
9571080
=================
9581081
Example Notebooks
9591082
=================

doc/workflows/pipelines/sagemaker.workflow.pipelines.rst

+2
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,8 @@ Step Collections
132132

133133
.. autoclass:: sagemaker.workflow.model_step.ModelStep
134134

135+
.. autoclass:: sagemaker.workflow.monitor_batch_transform_step.MonitorBatchTransformStep
136+
135137
Steps
136138
-----
137139

0 commit comments

Comments
 (0)