Skip to content

refactor: unused vars, language and import changes #12

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Nov 7, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
116 changes: 11 additions & 105 deletions 04_training_pipeline/pipeline.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -34,13 +34,13 @@
"source": [
"# Introduction\n",
"\n",
"This notebook demonstrate how to build a reusable computer vision (CV) pattern using **SageMaker Pipeline**. This particular pattern goes through preprocessing, training, and evaluating steps for 2 different training jobs:1) Spot training and 2) On Demand training. If the accuracy meets certain requirements, the models are then registered with SageMaker Model Registry.\n",
"This notebook demonstrate how to build a reusable computer vision (CV) pattern using **SageMaker Pipelines**. This particular pattern goes through preprocessing, training, and evaluating steps for 2 different training jobs: 1) Spot training and 2) On Demand training. If the accuracy meets certain requirements, the models are then registered with SageMaker Model Registry.\n",
"\n",
"We have also tagged the training workloads: `TrainingType: Spot or OnDemand`. If you are interested and have permission to access billing of your AWS account, you the cost savings from spot training from the side-by-side comparison. To enable custom cost allocation tags, please follow this [AWS documentation](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/activating-tags.html). It takes 12-48 hrs for the new tag to show in your cost explore.\n",
"We have also tagged the training workloads: `TrainingType: Spot or OnDemand`. If you are interested and have permission to access billing of your AWS account, you can see the cost savings from spot training in a side-by-side comparison. To enable custom cost allocation tags, please follow this [AWS documentation](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/activating-tags.html). It takes 12-48 hrs for the new tag to show in Cost Explorer.\n",
"\n",
"![Spot Training](statics/cost-explore.png)\n",
"\n",
"SageMaker pipelines works on the concept of steps. The order steps are executed in is inferred from the dependencies each step have. If a step has a dependency on the output from a previous step, it's not executed until after that step has completed successfully. This also allows SageMaker to create a **Direct Acyclic Graph, DAG,** that can be visuallized in Amazon SageMaker Studio (see diagram above). The DAG can be used to track pipeline executions, inputs/outputs and metrics, giving user the full lineage of the model creation.\n",
"SageMaker pipelines works on the concept of steps. The order steps are executed in is inferred from the dependencies each step have. If a step has a dependency on the output from a previous step, it's not executed until after that step has completed successfully. This also allows SageMaker to create a **Direct Acyclic Graph, DAG,** that can be visuallized in Amazon SageMaker Studio (see diagram above). The DAG can be used to track pipeline executions, inputs/outputs and metrics, giving users the full lineage of the model creation.\n",
"\n",
"![Training Pipeline](statics/cv-training-pipeline.png)\n",
"\n",
Expand All @@ -57,7 +57,7 @@
"\n",
"- Access to the SageMaker default S3 bucket\n",
"- Access to Elastic Container Registry (ECR)\n",
"- For the optional portion of this lab, you will need access to CloudFormation, Service Catelog, and Cost Explore\n",
"- For the optional portion of this lab, you will need access to CloudFormation, Service Catalog, and Cost Explorer\n",
"- Familiarity with Training on Amazon SageMaker\n",
"- Familiarity with Python\n",
"- Familiarity with AWS S3\n",
Expand All @@ -73,9 +73,7 @@
"source": [
"## Setup\n",
"\n",
"Here we define the sagemaker session, default bucket, job prefixes, pipeline and model group names\n",
"\n",
"We are using some of the newly released SageMaker Pipeline features. Please make sure you ugrade your sageMaker version by running the cell below."
"Here we define the sagemaker session, default bucket, job prefixes, pipeline and model group names."
]
},
{
Expand Down Expand Up @@ -199,11 +197,6 @@
" name=\"ProcessingInstanceCount\", default_value=1\n",
")\n",
"\n",
"model_approval_status = ParameterString(\n",
" name=\"ModelApprovalStatus\",\n",
" default_value=\"PendingManualApproval\" # ModelApprovalStatus can be set to a default of \"Approved\" if you don't want manual approval.\n",
")\n",
"\n",
"input_data = ParameterString(\n",
" name=\"InputDataUrl\",\n",
" default_value=s3_raw_data\n",
Expand Down Expand Up @@ -232,7 +225,7 @@
"### Define Cache Configuration\n",
"When step cache is defined, before SageMaker Pipelines executes a step, it attempts to find a previous execution of a step that was called with the same arguments.\n",
"\n",
"Pipelines doesn't check whether the data or code that the arguments point to has changed. If a previous execution is found, Pipelines will propagates the values from the cache hit during execution, rather than recomputing the step.\n",
"Pipelines doesn't check whether the data or code that the arguments point to has changed. If a previous execution is found, Pipelines will propagate the values from the cache hit during execution, rather than recomputing the step.\n",
"\n",
"Step caching is available for the following step types:\n",
"\n",
Expand Down Expand Up @@ -260,7 +253,7 @@
"metadata": {},
"source": [
"### Preprocess data step\n",
"We are taking the original code in Jupyter notebook and containerized script to run in a preprocessing job.\n",
"We are taking the original code in Jupyter notebook and create a containerized script to run in a preprocessing job.\n",
"\n",
"The [preprocess.py](./preprocess.py) script takes in the raw images files and splits them into training, validation and test sets by class.\n",
"It merges the class annotation files so that you have a manifest file for each separate data set. And exposes two parameters: classes (allows you to filter the number of classes you want to train the model on; default is all classes) and input-data (the human readable name of the classes).\n",
Expand Down Expand Up @@ -478,12 +471,9 @@
"from sagemaker.processing import (\n",
" ProcessingInput,\n",
" ProcessingOutput,\n",
" FrameworkProcessor,\n",
" ScriptProcessor,\n",
")\n",
"\n",
"\n",
"\n",
"eval_steps = dict()\n",
"eval_reports = dict()\n",
"\n",
Expand Down Expand Up @@ -583,7 +573,6 @@
" inference_instances=[\"ml.t2.medium\", \"ml.m5.large\"],\n",
" transform_instances=[\"ml.m5.large\"],\n",
" model_package_group_name=model_package_group_name,\n",
" approval_status=model_approval_status,\n",
" model_metrics=model_metrics,\n",
" )\n",
" \n",
Expand All @@ -607,10 +596,8 @@
"outputs": [],
"source": [
"from sagemaker.workflow.conditions import ConditionGreaterThanOrEqualTo\n",
"from sagemaker.workflow.condition_step import (\n",
" ConditionStep,\n",
" JsonGet,\n",
")\n",
"from sagemaker.workflow.condition_step import ConditionStep\n",
"from sagemaker.workflow.functions import JsonGet\n",
"\n",
"condition_steps = dict()\n",
"\n",
Expand All @@ -620,7 +607,7 @@
" # Models with a test accuracy lower than the condition will not be registered with the model registry.\n",
" cond_gte = ConditionGreaterThanOrEqualTo(\n",
" left=JsonGet(\n",
" step=eval_steps[t],\n",
" step_name=eval_steps[t].name,\n",
" property_file=eval_reports[t],\n",
" json_path=\"multiclass_classification_metrics.accuracy.value\",\n",
" ),\n",
Expand Down Expand Up @@ -677,7 +664,6 @@
" name=pipeline_name,\n",
" parameters=[\n",
" processing_instance_count,\n",
" model_approval_status,\n",
" input_data,\n",
" input_annotation,\n",
" class_selection\n",
Expand Down Expand Up @@ -817,99 +803,19 @@
"# ProcessingInstanceType=\"ml.m5.xlarge\",\n",
"# TrainingInstanceCount=1,\n",
"# TrainingInstanceType=\"ml.c5.4xlarge\",#\"ml.p3.2xlarge\",#\n",
" ModelApprovalStatus=\"PendingManualApproval\",\n",
" AnnotationFileName=\"classes.txt\",\n",
" ClassSelection=\"13, 17, 35, 36\"\n",
" )\n",
")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Clean up\n",
"Delete the model registry and the pipeline after you complete the lab."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"def delete_model_package_group(sm_client, package_group_name):\n",
" try:\n",
" model_versions = sm_client.list_model_packages(ModelPackageGroupName=package_group_name)\n",
"\n",
" except Exception as e:\n",
" print(\"{} \\n\".format(e))\n",
" return\n",
"\n",
" for model_version in model_versions[\"ModelPackageSummaryList\"]:\n",
" try:\n",
" sm_client.delete_model_package(ModelPackageName=model_version[\"ModelPackageArn\"])\n",
" except Exception as e:\n",
" print(\"{} \\n\".format(e))\n",
" time.sleep(0.5) # Ensure requests aren't throttled\n",
"\n",
" try:\n",
" sm_client.delete_model_package_group(ModelPackageGroupName=package_group_name)\n",
" print(\"{} model package group deleted\".format(package_group_name))\n",
" except Exception as e:\n",
" print(\"{} \\n\".format(e))\n",
" return\n",
"\n",
"\n",
"def delete_sagemaker_pipeline(sm_client, pipeline_name):\n",
" try:\n",
" sm_client.delete_pipeline(\n",
" PipelineName=pipeline_name,\n",
" )\n",
" print(\"{} pipeline deleted\".format(pipeline_name))\n",
" except Exception as e:\n",
" print(\"{} \\n\".format(e))\n",
" return\n",
" \n",
"def delete_sagemaker_project(sm_client, project_name):\n",
" try:\n",
" sm_client.delete_project(\n",
" ProjectName=project_name,\n",
" )\n",
" print(\"{} project deleted\".format(project_name))\n",
" except Exception as e:\n",
" print(\"{} \\n\".format(e))\n",
" return"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import boto3\n",
"import time\n",
"\n",
"client = boto3.client(\"sagemaker\")\n",
"\n",
"# Uncomment the lines below to clean the pipeline.\n",
"#delete_model_package_group(client, model_package_group_name)\n",
"#delete_sagemaker_pipeline(client, pipeline_name)\n",
"\n",
"#delete_model_package_group(client, model_package_group_name2)\n",
"#delete_sagemaker_pipeline(client, pipeline_name2)\n",
"\n",
"# delete_sagemaker_project(client, \"<Your-Project-Name>\")#\"cv-week4-training\") #"
]
}
],
"metadata": {
"instance_type": "ml.t3.medium",
"kernelspec": {
"display_name": "Python 3 (Data Science)",
"language": "python",
"name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
"name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:eu-west-1:470317259841:image/datascience-1.0"
},
"language_info": {
"codemirror_mode": {
Expand Down
9 changes: 0 additions & 9 deletions 04_training_pipeline/pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -107,10 +107,6 @@ def get_pipeline(
# name="TrainingInstanceType", default_value="ml.c5.4xlarge"
# )

model_approval_status = ParameterString(
name="ModelApprovalStatus",
default_value="PendingManualApproval" # ModelApprovalStatus can be set to a default of "Approved" if you don't want manual approval.
)

input_data = ParameterString(
name="InputDataUrl",
Expand Down Expand Up @@ -326,7 +322,6 @@ def get_pipeline(
inference_instances=["ml.t2.medium", "ml.m5.large"],
transform_instances=["ml.m5.large"],
model_package_group_name=model_package_group_name,
approval_status=model_approval_status,
model_metrics=model_metrics,
)

Expand Down Expand Up @@ -369,11 +364,7 @@ def get_pipeline(
pipeline = Pipeline(
name=pipeline_name,
parameters=[
# processing_instance_type,
processing_instance_count,
# training_instance_count,
# training_instance_type,
model_approval_status,
input_data,
input_annotation,
class_selection
Expand Down
14 changes: 2 additions & 12 deletions 05_deployment/sagemaker-deploy-model-for-inference.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -111,10 +111,7 @@
"metadata": {},
"outputs": [],
"source": [
"# Get S3 URI of the model artifact\n",
"s3_client = boto3.client('s3')\n",
"model_dir = s3_client.list_objects_v2(Bucket=bucket, Delimiter='/', Prefix=f'{prefix}/{prefix}')['CommonPrefixes'][-1]['Prefix']\n",
"bird_model_path = f's3://{bucket}/{model_dir}output/model.tar.gz'\n",
"bird_model_path = f's3://{bucket}/{prefix}/outputs/model/model.tar.gz'\n",
"print(f'bird_model_path: {bird_model_path}')"
]
},
Expand Down Expand Up @@ -347,13 +344,6 @@
" cv_utils.predict_bird_from_file(inputfile,predictor,possible_classes)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Clean Up"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -367,7 +357,7 @@
"kernelspec": {
"display_name": "Python 3 (Data Science)",
"language": "python",
"name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
"name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:eu-west-1:470317259841:image/datascience-1.0"
},
"language_info": {
"codemirror_mode": {
Expand Down