Merge pull request #12 from absynthe/refactor/04pipelines

absynthe · web-flow · commit e6684e2a7988 · 2022-11-07T08:52:54.000+01:00
refactor: unused vars, language and import changes
diff --git a/04_training_pipeline/pipeline.ipynb b/04_training_pipeline/pipeline.ipynb
@@ -34,13 +34,13 @@
    "source": [
     "# Introduction\n",
     "\n",
-    "This notebook demonstrate how to build a reusable computer vision (CV) pattern using **SageMaker Pipeline**. This particular pattern goes through preprocessing, training, and evaluating steps for 2 different training jobs:1) Spot training and 2) On Demand training.  If the accuracy meets certain requirements, the models are then registered with SageMaker Model Registry.\n",
+    "This notebook demonstrate how to build a reusable computer vision (CV) pattern using **SageMaker Pipelines**. This particular pattern goes through preprocessing, training, and evaluating steps for 2 different training jobs: 1) Spot training and 2) On Demand training.  If the accuracy meets certain requirements, the models are then registered with SageMaker Model Registry.\n",
     "\n",
-    "We have also tagged the training workloads: `TrainingType: Spot or OnDemand`.  If you are interested and have permission to access billing of your AWS account, you the cost savings from spot training from the side-by-side comparison. To enable custom cost allocation tags, please follow this [AWS documentation](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/activating-tags.html).  It takes 12-48 hrs for the new tag to show in your cost explore.\n",
+    "We have also tagged the training workloads: `TrainingType: Spot or OnDemand`.  If you are interested and have permission to access billing of your AWS account, you can see the cost savings from spot training in a side-by-side comparison. To enable custom cost allocation tags, please follow this [AWS documentation](https://docs.aws.amazon.com/awsaccountbilling/latest/aboutv2/activating-tags.html).  It takes 12-48 hrs for the new tag to show in Cost Explorer.\n",
     "\n",
     "![Spot Training](statics/cost-explore.png)\n",
     "\n",
-    "SageMaker pipelines works on the concept of steps. The order steps are executed in is inferred from the dependencies each step have. If a step has a dependency on the output from a previous step, it's not executed until after that step has completed successfully. This also allows SageMaker to create a **Direct Acyclic Graph, DAG,** that can be visuallized in Amazon SageMaker Studio (see diagram above). The DAG can be used to track pipeline executions, inputs/outputs and metrics, giving user the full lineage of the model creation.\n",
+    "SageMaker pipelines works on the concept of steps. The order steps are executed in is inferred from the dependencies each step have. If a step has a dependency on the output from a previous step, it's not executed until after that step has completed successfully. This also allows SageMaker to create a **Direct Acyclic Graph, DAG,** that can be visuallized in Amazon SageMaker Studio (see diagram above). The DAG can be used to track pipeline executions, inputs/outputs and metrics, giving users the full lineage of the model creation.\n",
     "\n",
     "![Training Pipeline](statics/cv-training-pipeline.png)\n",
     "\n",
@@ -57,7 +57,7 @@
     "\n",
     "- Access to the SageMaker default S3 bucket\n",
     "- Access to Elastic Container Registry (ECR)\n",
-    "- For the optional portion of this lab, you will need access to CloudFormation, Service Catelog, and Cost Explore\n",
+    "- For the optional portion of this lab, you will need access to CloudFormation, Service Catalog, and Cost Explorer\n",
     "- Familiarity with Training on Amazon SageMaker\n",
     "- Familiarity with Python\n",
     "- Familiarity with AWS S3\n",
@@ -73,9 +73,7 @@
    "source": [
     "## Setup\n",
     "\n",
-    "Here we define the sagemaker session, default bucket, job prefixes, pipeline and model group names\n",
-    "\n",
-    "We are using some of the newly released SageMaker Pipeline features.  Please make sure you ugrade your sageMaker version by running the cell below."
+    "Here we define the sagemaker session, default bucket, job prefixes, pipeline and model group names."
    ]
   },
   {
@@ -199,11 +197,6 @@
     "    name=\"ProcessingInstanceCount\", default_value=1\n",
     ")\n",
     "\n",
-    "model_approval_status = ParameterString(\n",
-    "    name=\"ModelApprovalStatus\",\n",
-    "    default_value=\"PendingManualApproval\"  # ModelApprovalStatus can be set to a default of \"Approved\" if you don't want manual approval.\n",
-    ")\n",
-    "\n",
     "input_data = ParameterString(\n",
     "    name=\"InputDataUrl\",\n",
     "    default_value=s3_raw_data\n",
@@ -232,7 +225,7 @@
     "### Define Cache Configuration\n",
     "When step cache is defined, before SageMaker Pipelines executes a step, it attempts to find a previous execution of a step that was called with the same arguments.\n",
     "\n",
-    "Pipelines doesn't check whether the data or code that the arguments point to has changed. If a previous execution is found, Pipelines will propagates the values from the cache hit during execution, rather than recomputing the step.\n",
+    "Pipelines doesn't check whether the data or code that the arguments point to has changed. If a previous execution is found, Pipelines will propagate the values from the cache hit during execution, rather than recomputing the step.\n",
     "\n",
     "Step caching is available for the following step types:\n",
     "\n",
@@ -260,7 +253,7 @@
    "metadata": {},
    "source": [
     "### Preprocess data step\n",
-    "We are taking the original code in Jupyter notebook and containerized script to run in a preprocessing job.\n",
+    "We are taking the original code in Jupyter notebook and create a containerized script to run in a preprocessing job.\n",
     "\n",
     "The [preprocess.py](./preprocess.py) script takes in the raw images files and splits them into training, validation and test sets by class.\n",
     "It merges the class annotation files so that you have a manifest file for each separate data set. And exposes two parameters: classes (allows you to filter the number of classes you want to train the model on; default is all classes) and input-data (the human readable name of the classes).\n",
@@ -478,12 +471,9 @@
     "from sagemaker.processing import (\n",
     "    ProcessingInput,\n",
     "    ProcessingOutput,\n",
-    "    FrameworkProcessor,\n",
     "    ScriptProcessor,\n",
     ")\n",
     "\n",
-    "\n",
-    "\n",
     "eval_steps = dict()\n",
     "eval_reports = dict()\n",
     "\n",
@@ -583,7 +573,6 @@
     "        inference_instances=[\"ml.t2.medium\", \"ml.m5.large\"],\n",
     "        transform_instances=[\"ml.m5.large\"],\n",
     "        model_package_group_name=model_package_group_name,\n",
-    "        approval_status=model_approval_status,\n",
     "        model_metrics=model_metrics,\n",
     "    )\n",
     "    \n",
@@ -607,10 +596,8 @@
    "outputs": [],
    "source": [
     "from sagemaker.workflow.conditions import ConditionGreaterThanOrEqualTo\n",
-    "from sagemaker.workflow.condition_step import (\n",
-    "    ConditionStep,\n",
-    "    JsonGet,\n",
-    ")\n",
+    "from sagemaker.workflow.condition_step import ConditionStep\n",
+    "from sagemaker.workflow.functions import JsonGet\n",
     "\n",
     "condition_steps = dict()\n",
     "\n",
@@ -620,7 +607,7 @@
     "    # Models with a test accuracy lower than the condition will not be registered with the model registry.\n",
     "    cond_gte = ConditionGreaterThanOrEqualTo(\n",
     "        left=JsonGet(\n",
-    "            step=eval_steps[t],\n",
+    "            step_name=eval_steps[t].name,\n",
     "            property_file=eval_reports[t],\n",
     "            json_path=\"multiclass_classification_metrics.accuracy.value\",\n",
     "        ),\n",
@@ -677,7 +664,6 @@
     "    name=pipeline_name,\n",
     "    parameters=[\n",
     "        processing_instance_count,\n",
-    "        model_approval_status,\n",
     "        input_data,\n",
     "        input_annotation,\n",
     "        class_selection\n",
@@ -817,99 +803,19 @@
     "#         ProcessingInstanceType=\"ml.m5.xlarge\",\n",
     "#         TrainingInstanceCount=1,\n",
     "#         TrainingInstanceType=\"ml.c5.4xlarge\",#\"ml.p3.2xlarge\",#\n",
-    "        ModelApprovalStatus=\"PendingManualApproval\",\n",
     "        AnnotationFileName=\"classes.txt\",\n",
     "        ClassSelection=\"13, 17, 35, 36\"\n",
     "    )\n",
     ")"
    ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Clean up\n",
-    "Delete the model registry and the pipeline after you complete the lab."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def delete_model_package_group(sm_client, package_group_name):\n",
-    "    try:\n",
-    "        model_versions = sm_client.list_model_packages(ModelPackageGroupName=package_group_name)\n",
-    "\n",
-    "    except Exception as e:\n",
-    "        print(\"{} \\n\".format(e))\n",
-    "        return\n",
-    "\n",
-    "    for model_version in model_versions[\"ModelPackageSummaryList\"]:\n",
-    "        try:\n",
-    "            sm_client.delete_model_package(ModelPackageName=model_version[\"ModelPackageArn\"])\n",
-    "        except Exception as e:\n",
-    "            print(\"{} \\n\".format(e))\n",
-    "        time.sleep(0.5)  # Ensure requests aren't throttled\n",
-    "\n",
-    "    try:\n",
-    "        sm_client.delete_model_package_group(ModelPackageGroupName=package_group_name)\n",
-    "        print(\"{} model package group deleted\".format(package_group_name))\n",
-    "    except Exception as e:\n",
-    "        print(\"{} \\n\".format(e))\n",
-    "    return\n",
-    "\n",
-    "\n",
-    "def delete_sagemaker_pipeline(sm_client, pipeline_name):\n",
-    "    try:\n",
-    "        sm_client.delete_pipeline(\n",
-    "            PipelineName=pipeline_name,\n",
-    "        )\n",
-    "        print(\"{} pipeline deleted\".format(pipeline_name))\n",
-    "    except Exception as e:\n",
-    "        print(\"{} \\n\".format(e))\n",
-    "        return\n",
-    "    \n",
-    "def delete_sagemaker_project(sm_client, project_name):\n",
-    "    try:\n",
-    "        sm_client.delete_project(\n",
-    "            ProjectName=project_name,\n",
-    "        )\n",
-    "        print(\"{} project deleted\".format(project_name))\n",
-    "    except Exception as e:\n",
-    "        print(\"{} \\n\".format(e))\n",
-    "        return"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import boto3\n",
-    "import time\n",
-    "\n",
-    "client = boto3.client(\"sagemaker\")\n",
-    "\n",
-    "# Uncomment the lines below to clean the pipeline.\n",
-    "#delete_model_package_group(client, model_package_group_name)\n",
-    "#delete_sagemaker_pipeline(client, pipeline_name)\n",
-    "\n",
-    "#delete_model_package_group(client, model_package_group_name2)\n",
-    "#delete_sagemaker_pipeline(client, pipeline_name2)\n",
-    "\n",
-    "# delete_sagemaker_project(client, \"<Your-Project-Name>\")#\"cv-week4-training\") #"
-   ]
   }
  ],
  "metadata": {
   "instance_type": "ml.t3.medium",
   "kernelspec": {
    "display_name": "Python 3 (Data Science)",
    "language": "python",
-   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
+   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:eu-west-1:470317259841:image/datascience-1.0"
   },
   "language_info": {
    "codemirror_mode": {
diff --git a/04_training_pipeline/pipeline.py b/04_training_pipeline/pipeline.py
@@ -107,10 +107,6 @@ def get_pipeline(
 #         name="TrainingInstanceType", default_value="ml.c5.4xlarge"
 #     )
 
-    model_approval_status = ParameterString(
-        name="ModelApprovalStatus",
-        default_value="PendingManualApproval"  # ModelApprovalStatus can be set to a default of "Approved" if you don't want manual approval.
-    )
 
     input_data = ParameterString(
         name="InputDataUrl",
@@ -326,7 +322,6 @@ def get_pipeline(
             inference_instances=["ml.t2.medium", "ml.m5.large"],
             transform_instances=["ml.m5.large"],
             model_package_group_name=model_package_group_name,
-            approval_status=model_approval_status,
             model_metrics=model_metrics,
         )
         
@@ -369,11 +364,7 @@ def get_pipeline(
     pipeline = Pipeline(
         name=pipeline_name,
         parameters=[
-#             processing_instance_type,
             processing_instance_count,
-#             training_instance_count,
-#             training_instance_type,
-            model_approval_status,
             input_data,
             input_annotation,
             class_selection
diff --git a/05_deployment/sagemaker-deploy-model-for-inference.ipynb b/05_deployment/sagemaker-deploy-model-for-inference.ipynb
@@ -111,10 +111,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# Get S3 URI of the model artifact\n",
-    "s3_client = boto3.client('s3')\n",
-    "model_dir = s3_client.list_objects_v2(Bucket=bucket, Delimiter='/', Prefix=f'{prefix}/{prefix}')['CommonPrefixes'][-1]['Prefix']\n",
-    "bird_model_path = f's3://{bucket}/{model_dir}output/model.tar.gz'\n",
+    "bird_model_path = f's3://{bucket}/{prefix}/outputs/model/model.tar.gz'\n",
     "print(f'bird_model_path: {bird_model_path}')"
    ]
   },
@@ -347,13 +344,6 @@
     "    cv_utils.predict_bird_from_file(inputfile,predictor,possible_classes)"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Clean Up"
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": null,
@@ -367,7 +357,7 @@
   "kernelspec": {
    "display_name": "Python 3 (Data Science)",
    "language": "python",
-   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:us-east-1:081325390199:image/datascience-1.0"
+   "name": "python3__SAGEMAKER_INTERNAL__arn:aws:sagemaker:eu-west-1:470317259841:image/datascience-1.0"
   },
   "language_info": {
    "codemirror_mode": {