Skip to content

Commit e6684e2

Browse files
authored
Merge pull request #12 from absynthe/refactor/04pipelines
refactor: unused vars, language and import changes
2 parents 9f0b787 + 5f17132 commit e6684e2

File tree

3 files changed

+13
-126
lines changed

3 files changed

+13
-126
lines changed

04_training_pipeline/pipeline.ipynb

Lines changed: 11 additions & 105 deletions
Original file line numberDiff line numberDiff line change
@@ -34,13 +34,13 @@
3434
"source": [
3535
"# Introduction\n",
3636
"\n",
37-
"This notebook demonstrate how to build a reusable computer vision (CV) pattern using **SageMaker Pipeline**. This particular pattern goes through preprocessing, training, and evaluating steps for 2 different training jobs:1) Spot training and 2) On Demand training. If the accuracy meets certain requirements, the models are then registered with SageMaker Model Registry.\n",
37+
"This notebook demonstrate how to build a reusable computer vision (CV) pattern using **SageMaker Pipelines**. This particular pattern goes through preprocessing, training, and evaluating steps for 2 different training jobs: 1) Spot training and 2) On Demand training. If the accuracy meets certain requirements, the models are then registered with SageMaker Model Registry.\n",
3838
"\n",
39-
"We have also tagged the training workloads: `TrainingType: Spot or OnDemand`. If you are interested and have permission to access billing of your AWS account, you the cost savings from spot training from the side-by-side comparison. To enable custom cost allocation tags, please follow this [AWS documentation](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/activating-tags.html). It takes 12-48 hrs for the new tag to show in your cost explore.\n",
39+
"We have also tagged the training workloads: `TrainingType: Spot or OnDemand`. If you are interested and have permission to access billing of your AWS account, you can see the cost savings from spot training in a side-by-side comparison. To enable custom cost allocation tags, please follow this [AWS documentation](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/activating-tags.html). It takes 12-48 hrs for the new tag to show in Cost Explorer.\n",
4040
"\n",
4141
"![Spot Training](statics/cost-explore.png)\n",
4242
"\n",
43-
"SageMaker pipelines works on the concept of steps. The order steps are executed in is inferred from the dependencies each step have. If a step has a dependency on the output from a previous step, it's not executed until after that step has completed successfully. This also allows SageMaker to create a **Direct Acyclic Graph, DAG,** that can be visuallized in Amazon SageMaker Studio (see diagram above). The DAG can be used to track pipeline executions, inputs/outputs and metrics, giving user the full lineage of the model creation.\n",
43+
"SageMaker pipelines works on the concept of steps. The order steps are executed in is inferred from the dependencies each step have. If a step has a dependency on the output from a previous step, it's not executed until after that step has completed successfully. This also allows SageMaker to create a **Direct Acyclic Graph, DAG,** that can be visuallized in Amazon SageMaker Studio (see diagram above). The DAG can be used to track pipeline executions, inputs/outputs and metrics, giving users the full lineage of the model creation.\n",
4444
"\n",
4545
"![Training Pipeline](statics/cv-training-pipeline.png)\n",
4646
"\n",
@@ -57,7 +57,7 @@
5757
"\n",
5858
"- Access to the SageMaker default S3 bucket\n",
5959
"- Access to Elastic Container Registry (ECR)\n",
60-
"- For the optional portion of this lab, you will need access to CloudFormation, Service Catelog, and Cost Explore\n",
60+
"- For the optional portion of this lab, you will need access to CloudFormation, Service Catalog, and Cost Explorer\n",
6161
"- Familiarity with Training on Amazon SageMaker\n",
6262
"- Familiarity with Python\n",
6363
"- Familiarity with AWS S3\n",
@@ -73,9 +73,7 @@
7373
"source": [
7474
"## Setup\n",
7575
"\n",
76-
"Here we define the sagemaker session, default bucket, job prefixes, pipeline and model group names\n",
77-
"\n",
78-
"We are using some of the newly released SageMaker Pipeline features. Please make sure you ugrade your sageMaker version by running the cell below."
76+
"Here we define the sagemaker session, default bucket, job prefixes, pipeline and model group names."
7977
]
8078
},
8179
{
@@ -199,11 +197,6 @@
199197
" name=\"ProcessingInstanceCount\", default_value=1\n",
200198
")\n",
201199
"\n",
202-
"model_approval_status = ParameterString(\n",
203-
" name=\"ModelApprovalStatus\",\n",
204-
" default_value=\"PendingManualApproval\" # ModelApprovalStatus can be set to a default of \"Approved\" if you don't want manual approval.\n",
205-
")\n",
206-
"\n",
207200
"input_data = ParameterString(\n",
208201
" name=\"InputDataUrl\",\n",
209202
" default_value=s3_raw_data\n",
@@ -232,7 +225,7 @@
232225
"### Define Cache Configuration\n",
233226
"When step cache is defined, before SageMaker Pipelines executes a step, it attempts to find a previous execution of a step that was called with the same arguments.\n",
234227
"\n",
235-
"Pipelines doesn't check whether the data or code that the arguments point to has changed. If a previous execution is found, Pipelines will propagates the values from the cache hit during execution, rather than recomputing the step.\n",
228+
"Pipelines doesn't check whether the data or code that the arguments point to has changed. If a previous execution is found, Pipelines will propagate the values from the cache hit during execution, rather than recomputing the step.\n",
236229
"\n",
237230
"Step caching is available for the following step types:\n",
238231
"\n",
@@ -260,7 +253,7 @@
260253
"metadata": {},
261254
"source": [
262255
"### Preprocess data step\n",
263-
"We are taking the original code in Jupyter notebook and containerized script to run in a preprocessing job.\n",
256+
"We are taking the original code in Jupyter notebook and create a containerized script to run in a preprocessing job.\n",
264257
"\n",
265258
"The [preprocess.py](./preprocess.py) script takes in the raw images files and splits them into training, validation and test sets by class.\n",
266259
"It merges the class annotation files so that you have a manifest file for each separate data set. And exposes two parameters: classes (allows you to filter the number of classes you want to train the model on; default is all classes) and input-data (the human readable name of the classes).\n",
@@ -478,12 +471,9 @@
478471
"from sagemaker.processing import (\n",
479472
" ProcessingInput,\n",
480473
" ProcessingOutput,\n",
481-
" FrameworkProcessor,\n",
482474
" ScriptProcessor,\n",
483475
")\n",
484476
"\n",
485-
"\n",
486-
"\n",
487477
"eval_steps = dict()\n",
488478
"eval_reports = dict()\n",
489479
"\n",
@@ -583,7 +573,6 @@
583573
" inference_instances=[\"ml.t2.medium\", \"ml.m5.large\"],\n",
584574
" transform_instances=[\"ml.m5.large\"],\n",
585575
" model_package_group_name=model_package_group_name,\n",
586-
" approval_status=model_approval_status,\n",
587576
" model_metrics=model_metrics,\n",
588577
" )\n",
589578
" \n",
@@ -607,10 +596,8 @@
607596
"outputs": [],
608597
"source": [
609598
"from sagemaker.workflow.conditions import ConditionGreaterThanOrEqualTo\n",
610-
"from sagemaker.workflow.condition_step import (\n",
611-
" ConditionStep,\n",
612-
" JsonGet,\n",
613-
")\n",
599+
"from sagemaker.workflow.condition_step import ConditionStep\n",
600+
"from sagemaker.workflow.functions import JsonGet\n",
614601
"\n",
615602
"condition_steps = dict()\n",
616603
"\n",
@@ -620,7 +607,7 @@
620607
" # Models with a test accuracy lower than the condition will not be registered with the model registry.\n",
621608
" cond_gte = ConditionGreaterThanOrEqualTo(\n",
622609
" left=JsonGet(\n",
623-
" step=eval_steps[t],\n",
610+
" step_name=eval_steps[t].name,\n",
624611
" property_file=eval_reports[t],\n",
625612
" json_path=\"multiclass_classification_metrics.accuracy.value\",\n",
626613
" ),\n",
@@ -677,7 +664,6 @@
677664
" name=pipeline_name,\n",
678665
" parameters=[\n",
679666
" processing_instance_count,\n",
680-
" model_approval_status,\n",
681667
" input_data,\n",
682668
" input_annotation,\n",
683669
" class_selection\n",
@@ -817,99 +803,19 @@
817803
"# ProcessingInstanceType=\"ml.m5.xlarge\",\n",
818804
"# TrainingInstanceCount=1,\n",
819805
"# TrainingInstanceType=\"ml.c5.4xlarge\",#\"ml.p3.2xlarge\",#\n",
820-
" ModelApprovalStatus=\"PendingManualApproval\",\n",
821806
" AnnotationFileName=\"classes.txt\",\n",
822807
" ClassSelection=\"13, 17, 35, 36\"\n",
823808
" )\n",
824809
")"
825810
]
826-
},
827-
{
828-
"cell_type": "markdown",
829-
"metadata": {},
830-
"source": [
831-
"## Clean up\n",
832-
"Delete the model registry and the pipeline after you complete the lab."
833-
]
834-
},
835-
{
836-
"cell_type": "code",
837-
"execution_count": null,
838-
"metadata": {},
839-
"outputs": [],
840-
"source": [
841-
"def delete_model_package_group(sm_client, package_group_name):\n",
842-
" try:\n",
843-
" model_versions = sm_client.list_model_packages(ModelPackageGroupName=package_group_name)\n",
844-
"\n",
845-
" except Exception as e:\n",
846-
" print(\"{} \\n\".format(e))\n",
847-
" return\n",
848-
"\n",
849-
" for model_version in model_versions[\"ModelPackageSummaryList\"]:\n",
850-
" try:\n",
851-
" sm_client.delete_model_package(ModelPackageName=model_version[\"ModelPackageArn\"])\n",
852-
" except Exception as e:\n",
853-
" print(\"{} \\n\".format(e))\n",
854-
" time.sleep(0.5) # Ensure requests aren't throttled\n",
855-
"\n",
856-
" try:\n",
857-
" sm_client.delete_model_package_group(ModelPackageGroupName=package_group_name)\n",
858-
" print(\"{} model package group deleted\".format(package_group_name))\n",
859-
" except Exception as e:\n",
860-
" print(\"{} \\n\".format(e))\n",
861-
" return\n",
862-
"\n",
863-
"\n",
864-
"def delete_sagemaker_pipeline(sm_client, pipeline_name):\n",
865-
" try:\n",
866-
" sm_client.delete_pipeline(\n",
867-
" PipelineName=pipeline_name,\n",
868-
" )\n",
869-
" print(\"{} pipeline deleted\".format(pipeline_name))\n",
870-
" except Exception as e:\n",
871-
" print(\"{} \\n\".format(e))\n",
872-
" return\n",
873-
" \n",
874-
"def delete_sagemaker_project(sm_client, project_name):\n",
875-
" try:\n",
876-
" sm_client.delete_project(\n",
877-
" ProjectName=project_name,\n",
878-
" )\n",
879-
" print(\"{} project deleted\".format(project_name))\n",
880-
" except Exception as e:\n",
881-
" print(\"{} \\n\".format(e))\n",
882-
" return"
883-
]
884-
},
885-
{
886-
"cell_type": "code",
887-
"execution_count": null,
888-
"metadata": {},
889-
"outputs": [],
890-
"source": [
891-
"import boto3\n",
892-
"import time\n",
893-
"\n",
894-
"client = boto3.client(\"sagemaker\")\n",
895-
"\n",
896-
"# Uncomment the lines below to clean the pipeline.\n",
897-
"#delete_model_package_group(client, model_package_group_name)\n",
898-
"#delete_sagemaker_pipeline(client, pipeline_name)\n",
899-
"\n",
900-
"#delete_model_package_group(client, model_package_group_name2)\n",
901-
"#delete_sagemaker_pipeline(client, pipeline_name2)\n",
902-
"\n",
903-
"# delete_sagemaker_project(client, \"<Your-Project-Name>\")#\"cv-week4-training\") #"
904-
]
905811
}
906812
],
907813
"metadata": {
908814
"instance_type": "ml.t3.medium",
909815
"kernelspec": {
910816
"display_name": "Python 3 (Data Science)",
911817
"language": "python",
912-
"name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
818+
"name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:eu-west-1:470317259841:image/datascience-1.0"
913819
},
914820
"language_info": {
915821
"codemirror_mode": {

04_training_pipeline/pipeline.py

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -107,10 +107,6 @@ def get_pipeline(
107107
# name="TrainingInstanceType", default_value="ml.c5.4xlarge"
108108
# )
109109

110-
model_approval_status = ParameterString(
111-
name="ModelApprovalStatus",
112-
default_value="PendingManualApproval" # ModelApprovalStatus can be set to a default of "Approved" if you don't want manual approval.
113-
)
114110

115111
input_data = ParameterString(
116112
name="InputDataUrl",
@@ -326,7 +322,6 @@ def get_pipeline(
326322
inference_instances=["ml.t2.medium", "ml.m5.large"],
327323
transform_instances=["ml.m5.large"],
328324
model_package_group_name=model_package_group_name,
329-
approval_status=model_approval_status,
330325
model_metrics=model_metrics,
331326
)
332327

@@ -369,11 +364,7 @@ def get_pipeline(
369364
pipeline = Pipeline(
370365
name=pipeline_name,
371366
parameters=[
372-
# processing_instance_type,
373367
processing_instance_count,
374-
# training_instance_count,
375-
# training_instance_type,
376-
model_approval_status,
377368
input_data,
378369
input_annotation,
379370
class_selection

05_deployment/sagemaker-deploy-model-for-inference.ipynb

Lines changed: 2 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -111,10 +111,7 @@
111111
"metadata": {},
112112
"outputs": [],
113113
"source": [
114-
"# Get S3 URI of the model artifact\n",
115-
"s3_client = boto3.client('s3')\n",
116-
"model_dir = s3_client.list_objects_v2(Bucket=bucket, Delimiter='/', Prefix=f'{prefix}/{prefix}')['CommonPrefixes'][-1]['Prefix']\n",
117-
"bird_model_path = f's3://{bucket}/{model_dir}output/model.tar.gz'\n",
114+
"bird_model_path = f's3://{bucket}/{prefix}/outputs/model/model.tar.gz'\n",
118115
"print(f'bird_model_path: {bird_model_path}')"
119116
]
120117
},
@@ -347,13 +344,6 @@
347344
" cv_utils.predict_bird_from_file(inputfile,predictor,possible_classes)"
348345
]
349346
},
350-
{
351-
"cell_type": "markdown",
352-
"metadata": {},
353-
"source": [
354-
"## Clean Up"
355-
]
356-
},
357347
{
358348
"cell_type": "code",
359349
"execution_count": null,
@@ -367,7 +357,7 @@
367357
"kernelspec": {
368358
"display_name": "Python 3 (Data Science)",
369359
"language": "python",
370-
"name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
360+
"name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:eu-west-1:470317259841:image/datascience-1.0"
371361
},
372362
"language_info": {
373363
"codemirror_mode": {

0 commit comments

Comments
 (0)