feature: Add support for Partial Dependence Plots(PDP) in SageMaker Clarify #2716

keerthanvasist · 2021-10-19T08:29:00Z

Description of changes:
Add support for Partial Dependence Plots(PDP) in SageMaker Clarify

Testing done:
Added unit and integration tests.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

I have read the CONTRIBUTING doc
I certify that the changes I am introducing will be backword compatible, and I have discussed concerns about this, if any, with the Python SDK team
I used the commit message format described in CONTRIBUTING
I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

I have added tests that prove my fix is effective or that my feature works (if appropriate)
I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
I have checked that my tests are not configured for a specific region or account (if appropriate)
I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

sagemaker-bot · 2021-10-19T08:35:06Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-unit-tests
Commit ID: 13c3b7d
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-10-19T09:25:49Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-local-mode-tests
Commit ID: 13c3b7d
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-10-19T09:28:15Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-slow-tests
Commit ID: 13c3b7d
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-10-20T00:09:52Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-unit-tests
Commit ID: 8c40262
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-10-20T00:43:15Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-slow-tests
Commit ID: 8c40262
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-10-20T00:52:33Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-local-mode-tests
Commit ID: 8c40262
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-10-20T00:58:54Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-local-mode-tests
Commit ID: 8c40262
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

karan6181

LGTM. Thanks!

karan6181 · 2021-10-20T16:13:18Z

tox.ini

@@ -140,7 +140,7 @@ setenv =
    LANG=C.UTF-8
 deps = black
 commands =
-    black -l 100 --check ./
+    black -l 100 ./


is it by mistake ? The removal of --check ?

jeniyat

LGTM

sagemaker-bot · 2021-10-20T17:04:19Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-unit-tests
Commit ID: b794c09
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-10-20T17:25:44Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-unit-tests
Commit ID: 261a3a6
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-10-20T17:44:50Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-local-mode-tests
Commit ID: b794c09
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-10-20T17:49:36Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-slow-tests
Commit ID: b794c09
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-10-20T17:52:26Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-slow-tests
Commit ID: 261a3a6
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-10-20T18:05:58Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-local-mode-tests
Commit ID: 261a3a6
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

larroy · 2021-10-20T18:11:37Z

src/sagemaker/clarify.py

-        analysis_config["methods"] = explainability_config.get_explainability_config()
+
+        explainability_methods = {}
+        if isinstance(explainability_config, List):  # pylint: disable=W1116


this should be "list", List is from typing.

larroy · 2021-10-20T18:12:10Z

src/sagemaker/clarify.py

@@ -772,7 +828,8 @@ def run_explainability(
            model_config (:class:`~sagemaker.clarify.ModelConfig`): Config of the model and its
                endpoint to be created.
            explainability_config (:class:`~sagemaker.clarify.ExplainabilityConfig`): Config of the
-                specific explainability method. Currently, only SHAP is supported.
+                specific explainability method or a list of ExplainabilityConfig objects. Currently,


why don't you document that it can be a list also?

larroy · 2021-10-20T18:16:46Z

tests/integ/test_clarify.py

@@ -258,6 +262,57 @@ def test_shap(clarify_processor, data_config, model_config, shap_config, sagemak
        check_analysis_config(data_config, sagemaker_session, "shap")


+def test_pdp(clarify_processor, data_config, model_config, pdp_config, sagemaker_session):
+    with timeout.timeout(minutes=CLARIFY_DEFAULT_TIMEOUT_MINUTES):


is this running a real job?

Yes. I removed the new integration tests added.

xgchena · 2021-10-20T21:35:00Z

tox.ini

@@ -46,7 +46,7 @@ ignore =
 require-code = True

 [doc8]
-ignore-path=.tox,src/sagemaker.egg-info
+ignore-path=.tox,src/sagemaker.egg-info,venv


I feel this change should be removed. I guess you use this venv folder for Python virtual env, it is your own setup, not part of the standard setup recommended by contribution guide. (As a workaround you can use a different path, like ~/.venv)

+1, please remove this change.

I have removed this change. I chose to leave it there because a venv is excluded for other commands in the same file, and in gitignore as well. It would have been a harmless convenience

venv is pretty common, maybe make it a separate commit?

xgchena · 2021-10-20T21:38:19Z

src/sagemaker/clarify.py

@@ -54,7 +56,11 @@ def __init__(
                "ShardedByS3Key".
            s3_compression_type (str): Valid options are "None" or "Gzip".
        """
-        if dataset_type not in ["text/csv", "application/jsonlines", "application/x-parquet"]:
+        if dataset_type not in [


General comment: my suggestion is that you undo code formatting like because they are not necessary and distract the reviewers, ideally you should only make necessary changes for the feature. (If the code is formatted automatically by your IDE then reset the IDE settings)

I removed this for now. There are multiple formatting checks (pylint, black etc) which check on push too. It is hard for developer also to figure out which change was cause by IDE and caused by formatters.

Then try harder not to let your IDE cause trouble.

xgchena · 2021-10-20T23:51:14Z

src/sagemaker/clarify.py

+
+    def get_explainability_config(self):
+        """Returns config."""
+        return {"pdp": copy.deepcopy(self.pdp_config)}


Make a copy of the entire return value. (so that any modification to the return value by the caller won't impact the PDPConfig object.)

copy.deepcopy({"pdp": self.pdp_config})

I have updated this, although I see no difference between the two.

There is. In below case do you expect config_dict2 to have the update made for config_dict?

config_dict = pdp_config.get_explainability_config() config_dict["pdp"]["asdf"] = 1 config_dict2 = pdp_config.get_explainability_config()

You are right. My concern was the caller may update the return value and impact the original class, but in this case there seems no impact. It’s fine as lone as there is no impact.

xgchena · 2021-10-21T00:08:09Z

src/sagemaker/clarify.py

+                if not isinstance(config, ExplainabilityConfig):
+                    raise ValueError(
+                        f"Invalid input: Excepted ExplainabilityConfig, got {type(config)} instead"
+                    )


Although the isinstance(config, ExplainabilityConfig) check allows raising a better error/message to the customer, it is not necessary because if explainability_config is invalid then explainability_config.get_explainability_config() should fail anyway.

xgchena · 2021-10-21T00:08:47Z

src/sagemaker/clarify.py

+                explainability_methods[list(explain_config.keys())[0]] = explain_config[
+                    list(explain_config.keys())[0]
+                ]
+        elif isinstance(explainability_config, ExplainabilityConfig):


I think it should be simply a "else:", then the code fail fast here if an invalid explainability_config parameter is passed to the method. Better than sending an empty "methods" list to server side for validation.

xgchena · 2021-10-21T00:12:44Z

src/sagemaker/clarify.py

+                explainability_methods[list(explain_config.keys())[0]] = explain_config[
+                    list(explain_config.keys())[0]
+                ]


explainability_methods.update(explain_config) should do the job.

And there should be validations after the for loop

validate the length of the result it shall not be zero

validate the length of the result to make sure that there was no overwrite,

assert len(explainability_methods.keys())>0, "Please provide at least one explaianbility config." assert len(explainability_methods.keys())==len(explainability_config) "There are duplicate explainability configs"

xgchena · 2021-10-21T00:20:00Z

tests/unit/test_clarify.py

@@ -574,6 +598,34 @@ def _run_test_shap(
        )


+@patch("sagemaker.utils.name_from_base", return_value=JOB_NAME)
+def test_pdp(


Ideally there should also be test cases for invalid input, for example:

explainability_config is None

explainability_config is an emtpy list

Multiple shap_config (or pdp_config) in explainability_config

Done. Additional tests are added.

xgchena · 2021-10-21T00:35:32Z

tests/integ/test_clarify.py

+def test_shap_and_pdp(
+    clarify_processor, data_config, model_config, shap_config, pdp_config, sagemaker_session
+):
+    with timeout.timeout(minutes=CLARIFY_DEFAULT_TIMEOUT_MINUTES):


As I remember pdp is time consuming as well as shap, if we run both in the same job then would the job runtime be close to the CLARIFY_DEFAULT_TIMEOUT_MINUTES? Considering that the time to launch processing instances and create shadow endpoint is not a fixed value, we should leave enough buffer for them to avoid flaky test. My suggestion is that you run this test case using your indivisual account to see the runtime and make a decision (on keeping the timeout value or increase it).

Thanks for contributing to SM PySDK.

15 Minutes timeout is a very long time for an integ test run. What is the actual runtime expected here?

We should be looking into ways to reduce this runtime and be frugal here. These integ tests currently run for every PR to PySDK and would add that much of a buffer causing our PR's to be queued up.

I did run these tests, and they pass within the timeout, but I also don't think they are super useful. So, I did away with all the new integ tests I added in this CR. Unit tests cover the changes I have made well. The only change is the what analysis_config is to be generated. Beyond that, clarify container integration tests will cover it.

If this is okay, I am also okay with this. However, we cannot reduce the runtime of these integration tests because bringing up a processing job takes time.

sagemaker-bot · 2021-10-21T05:03:23Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-unit-tests
Commit ID: 4bf250b
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-10-21T05:35:41Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-slow-tests
Commit ID: 4bf250b
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-10-21T05:46:44Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-local-mode-tests
Commit ID: 4bf250b
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-10-21T05:53:34Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-unit-tests
Commit ID: 1db1ff8
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-10-21T06:21:38Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-slow-tests
Commit ID: 1db1ff8
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-10-21T06:31:56Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-local-mode-tests
Commit ID: 1db1ff8
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

mufaddal-rohawala · 2021-10-21T17:52:17Z

src/sagemaker/clarify.py

+            assert (
+                len(explainability_config) > 0
+            ), "Please provide at least one explaianbility config."
+            for config in explainability_config:
+                explain_config = config.get_explainability_config()
+                explainability_methods.update(explain_config)
+            assert len(explainability_methods.keys()) == len(
+                explainability_config
+            ), "There are duplicate explainability configs"
+        else:


I would prefer to raise ValueErrors or some other meaningful errors here instead of AssertionError. AssertionError belongs to cases where you test inner behaviour and not make user-sanity checks.

Will do. Parking this PR until a fix goes in. Will update again next week.

Updated. This CR now ready.

…larify

sagemaker-bot · 2021-10-28T19:52:53Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-unit-tests
Commit ID: 65bc046
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-10-28T20:23:31Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-unit-tests
Commit ID: aaece32
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-10-28T20:35:54Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-local-mode-tests
Commit ID: 65bc046
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-10-28T20:37:49Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-slow-tests
Commit ID: 65bc046
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-10-28T21:03:35Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-local-mode-tests
Commit ID: aaece32
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-10-28T21:28:05Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-slow-tests
Commit ID: aaece32
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

sagemaker-bot · 2021-10-29T18:34:43Z

AWS CodeBuild CI Report

CodeBuild project: sagemaker-python-sdk-unit-tests
Commit ID: 8d57329
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

jeniyat

lgtm

navinsoni

Approving as all comments are addressed

…larify (aws#2716) Co-authored-by: Navin Soni <[email protected]>

keerthanvasist force-pushed the pdp branch from 13c3b7d to 8c40262 Compare October 19, 2021 23:55

karan6181 previously approved these changes Oct 20, 2021

View reviewed changes

jeniyat previously approved these changes Oct 20, 2021

View reviewed changes

keerthanvasist dismissed stale reviews from jeniyat and karan6181 via 261a3a6 October 20, 2021 17:12

keerthanvasist force-pushed the pdp branch from b794c09 to 261a3a6 Compare October 20, 2021 17:12

larroy reviewed Oct 20, 2021

View reviewed changes

xgchena suggested changes Oct 21, 2021

View reviewed changes

keerthanvasist force-pushed the pdp branch from 261a3a6 to 4bf250b Compare October 21, 2021 04:49

keerthanvasist force-pushed the pdp branch from 4bf250b to 1db1ff8 Compare October 21, 2021 05:41

mufaddal-rohawala reviewed Oct 21, 2021

View reviewed changes

keerthanvasist force-pushed the pdp branch from 1db1ff8 to 65bc046 Compare October 28, 2021 19:38

feature: Add support for Partial Dependence Plots(PDP) in SageMaker C…

aaece32

…larify

keerthanvasist force-pushed the pdp branch from 65bc046 to aaece32 Compare October 28, 2021 20:09

Satish615 approved these changes Oct 29, 2021

View reviewed changes

Merge branch 'master' into pdp

8d57329

xgchena approved these changes Oct 29, 2021

View reviewed changes

jeniyat approved these changes Oct 29, 2021

View reviewed changes

navinsoni approved these changes Oct 29, 2021

View reviewed changes

navinsoni merged commit aae4c57 into aws:master Oct 29, 2021

EthanShouhanCheng pushed a commit to SissiChenxy/sagemaker-python-sdk that referenced this pull request Jan 11, 2022

feature: Add support for Partial Dependence Plots(PDP) in SageMaker C…

7ac06db

…larify (aws#2716) Co-authored-by: Navin Soni <[email protected]>

feature: Add support for Partial Dependence Plots(PDP) in SageMaker Clarify #2716

feature: Add support for Partial Dependence Plots(PDP) in SageMaker Clarify #2716

Conversation

keerthanvasist commented Oct 19, 2021

Merge Checklist

General

Tests

sagemaker-bot commented Oct 19, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Oct 19, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Oct 19, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Oct 20, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Oct 20, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Oct 20, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Oct 20, 2021

AWS CodeBuild CI Report

karan6181 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeniyat left a comment

Choose a reason for hiding this comment

sagemaker-bot commented Oct 20, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Oct 20, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Oct 20, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Oct 20, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Oct 20, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Oct 20, 2021

AWS CodeBuild CI Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keerthanvasist Oct 21, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sagemaker-bot commented Oct 21, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Oct 21, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Oct 21, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Oct 21, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Oct 21, 2021

AWS CodeBuild CI Report

sagemaker-bot commented Oct 21, 2021

AWS CodeBuild CI Report

keerthanvasist Oct 21, 2021 •

edited

Loading

mufaddal-rohawala Oct 21, 2021 •

edited

Loading