change: make start time, end time and period configurable in sagemaker.analytics.TrainingJobAnalytics #730

icywang86rui · 2019-03-29T04:42:47Z

…ngJobAnalytics

Creating an TrainingJobAnalytics object fails if the training job has too many
data points in the specified metrics. Make start time, end time and period
configurable so the caller can get around this limit -
https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_GetMetricStatistics.html

Original issue:
#701

Issue #, if available:

Description of changes:

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

I have read the CONTRIBUTING doc
I used the commit message format described in CONTRIBUTING
I have added tests that prove my fix is effective or that my feature works (if appropriate)
I have updated any necessary documentation (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

…ngJobAnalytics Creating an TrainingJobAnalytics object fails if the training job has too many data points in the specified metrics. Make start time, end time and period configurable so the caller can get around this limit - https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_GetMetricStatistics.html Original issue: aws#701

jesterhazy · 2019-03-29T05:58:08Z

AWS CodeBuild CI Report

Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

jesterhazy · 2019-03-29T17:09:35Z

AWS CodeBuild CI Report

Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

jesterhazy · 2019-03-29T17:53:57Z

AWS CodeBuild CI Report

Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

laurenyu

don't forget to add the correct prefix to your PR title

laurenyu · 2019-03-29T16:18:58Z

src/sagemaker/analytics.py

@@ -216,6 +217,10 @@ def __init__(self, training_job_name, metric_names=None, sagemaker_session=None)
        self._sage_client = sagemaker_session.sagemaker_client
        self._cloudwatch = sagemaker_session.boto_session.client('cloudwatch')
        self._training_job_name = training_job_name
+        self._start_time = start_time
+        self._end_time = end_time
+        self._period = period if period else 60


should we make the default a constant?

laurenyu · 2019-03-29T17:56:12Z

tests/unit/test_analytics.py

+    start_time = datetime.datetime(2018, 5, 16, 1, 3, 4)
+    end_time = datetime.datetime(2018, 5, 16, 5, 1, 1)
+    period = 300
+    trainer = TrainingJobAnalytics("my-training-job", ["metric"],


nit: single quotes

jesterhazy · 2019-04-01T16:26:06Z

AWS CodeBuild CI Report

Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

jesterhazy · 2019-04-01T19:31:42Z

AWS CodeBuild CI Report

Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

jesterhazy · 2019-04-01T21:36:21Z

AWS CodeBuild CI Report

Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

laurenyu · 2019-04-01T22:41:50Z

src/sagemaker/analytics.py

@@ -216,6 +219,10 @@ def __init__(self, training_job_name, metric_names=None, sagemaker_session=None)
        self._sage_client = sagemaker_session.sagemaker_client
        self._cloudwatch = sagemaker_session.boto_session.client('cloudwatch')
        self._training_job_name = training_job_name
+        self._start_time = start_time
+        self._end_time = end_time
+        self._period = period if period else METRICS_PERIOD_DEFAULT


I think you could also do period or METRICS_PERIOD_DEFAULT?

src/sagemaker/analytics.py

laurenyu · 2019-04-01T22:42:49Z

tests/integ/test_tf_script_mode.py

@@ -47,6 +47,7 @@ def test_mnist(sagemaker_session, instance_type):
                           sagemaker_session=sagemaker_session,
                           py_version='py3',
                           framework_version=TensorFlow.LATEST_VERSION,
+                           metric_definitions=[{'Name': 'train:global_steps', 'Regex': 'global_step\/sec:\s(.*)'}], # noqa


does flake no longer enforce two spaces before the "#" on an inline comment? also, what's the noqa for?

flake8 doesn't like the \s and /

does it go away if you do r'global_step\/sec:\s(.*)'?

let me try that

jesterhazy · 2019-04-02T05:14:31Z

AWS CodeBuild CI Report

Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

laurenyu · 2019-04-02T15:48:18Z

don't forget to make the PR title match the guidelines in https://github.com/aws/sagemaker-python-sdk/blob/master/CONTRIBUTING.md#commit-message-guidlines

laurenyu

I labeled this as a "change" in the PR title, but wasn't sure if you thought "feature" might be more appropriate

jesterhazy · 2019-04-02T21:30:38Z

AWS CodeBuild CI Report

Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Co-authored-by: Dewen Qi <[email protected]>

* feature: Add experiment plus Run class (#691) * feature: Add Experiment helper classes (#646) * feature: Add Experiment helper classes feature: Add helper class _RunEnvironment * change: Change sleep retry to backoff retry for get TC * minor fixes in backoff retry Co-authored-by: Dewen Qi <[email protected]> * feature: Add helper classes and methods for Run class (#660) * feature: Add helper classes and methods for Run class * Add Parent class to address comment * fix docstyle check * Add arg docstrings in _helper Co-authored-by: Dewen Qi <[email protected]> * feature: Add Experiment Run class (#651) Co-authored-by: Dewen Qi <[email protected]> * change: Add integ tests for Run (#673) Co-authored-by: Dewen Qi <[email protected]> * Update run log metric to use MetricsManager (#678) * Update run.log_metric to use _MetricsManager * fix several metrics issues * Add doc strings to metrics.py Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> * change: Simplify exp plus integ test configuration (#694) Co-authored-by: Dewen Qi <[email protected]> * feature: add RunName to expeirment_config (#696) * change: Update Run init and add Run load and _RunContext (#707) * change: Update Run init and add Run load Add exp name and run group name to load and address comments * Address nit comments Co-authored-by: Dewen Qi <[email protected]> * fix: Fix run name uniqueness issue (#730) Co-authored-by: Dewen Qi <[email protected]> * change: Update integ tests for Exp Plus M1 changes (#741) Co-authored-by: Dewen Qi <[email protected]> * add metrics client to session object (#745) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * change: Add integ test for using Run in Transform Job (#749) Co-authored-by: Dewen Qi <[email protected]> * Add async metrics sink (#739) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * use metrics client provided by session (#754) * fix flaky metrics test (#753) * change: Change Run.init and Run.load to constructor and module method respectively (#752) Co-authored-by: Dewen Qi <[email protected]> * feature: Add latest metric service model (#757) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * fix: lowercase run name (#767) * Change: Minimize use of lower case tc name (#769) * change: Clean up test resources to remove model files (#756) * change: Clean up test resources to remove model files * fix: Change experiment enums to upper case * change: Upgrade boto3 and update test to validate mixed case name * fix: Update as per latest botocore release and backend change Co-authored-by: Dewen Qi <[email protected]> * lowercase trial component name (#776) * change: Expose sagemaker experiment doc strings * fix: Fix exp name mixed case in issue Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Yifei Zhu <[email protected]>

* feature: Add experiment plus Run class (aws#691) * feature: Add Experiment helper classes (aws#646) * feature: Add Experiment helper classes feature: Add helper class _RunEnvironment * change: Change sleep retry to backoff retry for get TC * minor fixes in backoff retry Co-authored-by: Dewen Qi <[email protected]> * feature: Add helper classes and methods for Run class (aws#660) * feature: Add helper classes and methods for Run class * Add Parent class to address comment * fix docstyle check * Add arg docstrings in _helper Co-authored-by: Dewen Qi <[email protected]> * feature: Add Experiment Run class (aws#651) Co-authored-by: Dewen Qi <[email protected]> * change: Add integ tests for Run (aws#673) Co-authored-by: Dewen Qi <[email protected]> * Update run log metric to use MetricsManager (aws#678) * Update run.log_metric to use _MetricsManager * fix several metrics issues * Add doc strings to metrics.py Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> * change: Simplify exp plus integ test configuration (aws#694) Co-authored-by: Dewen Qi <[email protected]> * feature: add RunName to expeirment_config (aws#696) * change: Update Run init and add Run load and _RunContext (aws#707) * change: Update Run init and add Run load Add exp name and run group name to load and address comments * Address nit comments Co-authored-by: Dewen Qi <[email protected]> * fix: Fix run name uniqueness issue (aws#730) Co-authored-by: Dewen Qi <[email protected]> * change: Update integ tests for Exp Plus M1 changes (aws#741) Co-authored-by: Dewen Qi <[email protected]> * add metrics client to session object (aws#745) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * change: Add integ test for using Run in Transform Job (aws#749) Co-authored-by: Dewen Qi <[email protected]> * Add async metrics sink (aws#739) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * use metrics client provided by session (aws#754) * fix flaky metrics test (aws#753) * change: Change Run.init and Run.load to constructor and module method respectively (aws#752) Co-authored-by: Dewen Qi <[email protected]> * feature: Add latest metric service model (aws#757) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * fix: lowercase run name (aws#767) * Change: Minimize use of lower case tc name (aws#769) * change: Clean up test resources to remove model files (aws#756) * change: Clean up test resources to remove model files * fix: Change experiment enums to upper case * change: Upgrade boto3 and update test to validate mixed case name * fix: Update as per latest botocore release and backend change Co-authored-by: Dewen Qi <[email protected]> * lowercase trial component name (aws#776) * change: Expose sagemaker experiment doc strings * fix: Fix exp name mixed case in issue Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Yifei Zhu <[email protected]>

* feature: Add experiment plus Run class (#691) * feature: Add Experiment helper classes (#646) * feature: Add Experiment helper classes feature: Add helper class _RunEnvironment * change: Change sleep retry to backoff retry for get TC * minor fixes in backoff retry Co-authored-by: Dewen Qi <[email protected]> * feature: Add helper classes and methods for Run class (#660) * feature: Add helper classes and methods for Run class * Add Parent class to address comment * fix docstyle check * Add arg docstrings in _helper Co-authored-by: Dewen Qi <[email protected]> * feature: Add Experiment Run class (#651) Co-authored-by: Dewen Qi <[email protected]> * change: Add integ tests for Run (#673) Co-authored-by: Dewen Qi <[email protected]> * Update run log metric to use MetricsManager (#678) * Update run.log_metric to use _MetricsManager * fix several metrics issues * Add doc strings to metrics.py Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> * change: Simplify exp plus integ test configuration (#694) Co-authored-by: Dewen Qi <[email protected]> * feature: add RunName to expeirment_config (#696) * change: Update Run init and add Run load and _RunContext (#707) * change: Update Run init and add Run load Add exp name and run group name to load and address comments * Address nit comments Co-authored-by: Dewen Qi <[email protected]> * fix: Fix run name uniqueness issue (#730) Co-authored-by: Dewen Qi <[email protected]> * change: Update integ tests for Exp Plus M1 changes (#741) Co-authored-by: Dewen Qi <[email protected]> * add metrics client to session object (#745) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * change: Add integ test for using Run in Transform Job (#749) Co-authored-by: Dewen Qi <[email protected]> * Add async metrics sink (#739) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * use metrics client provided by session (#754) * fix flaky metrics test (#753) * change: Change Run.init and Run.load to constructor and module method respectively (#752) Co-authored-by: Dewen Qi <[email protected]> * feature: Add latest metric service model (#757) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * fix: lowercase run name (#767) * Change: Minimize use of lower case tc name (#769) * change: Clean up test resources to remove model files (#756) * change: Clean up test resources to remove model files * fix: Change experiment enums to upper case * change: Upgrade boto3 and update test to validate mixed case name * fix: Update as per latest botocore release and backend change Co-authored-by: Dewen Qi <[email protected]> * lowercase trial component name (#776) * change: Expose sagemaker experiment doc strings * fix: Fix exp name mixed case in issue Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Yifei Zhu <[email protected]>

* feature: Add experiment plus Run class (aws#691) * feature: Add Experiment helper classes (aws#646) * feature: Add Experiment helper classes feature: Add helper class _RunEnvironment * change: Change sleep retry to backoff retry for get TC * minor fixes in backoff retry Co-authored-by: Dewen Qi <[email protected]> * feature: Add helper classes and methods for Run class (aws#660) * feature: Add helper classes and methods for Run class * Add Parent class to address comment * fix docstyle check * Add arg docstrings in _helper Co-authored-by: Dewen Qi <[email protected]> * feature: Add Experiment Run class (aws#651) Co-authored-by: Dewen Qi <[email protected]> * change: Add integ tests for Run (aws#673) Co-authored-by: Dewen Qi <[email protected]> * Update run log metric to use MetricsManager (aws#678) * Update run.log_metric to use _MetricsManager * fix several metrics issues * Add doc strings to metrics.py Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> * change: Simplify exp plus integ test configuration (aws#694) Co-authored-by: Dewen Qi <[email protected]> * feature: add RunName to expeirment_config (aws#696) * change: Update Run init and add Run load and _RunContext (aws#707) * change: Update Run init and add Run load Add exp name and run group name to load and address comments * Address nit comments Co-authored-by: Dewen Qi <[email protected]> * fix: Fix run name uniqueness issue (aws#730) Co-authored-by: Dewen Qi <[email protected]> * change: Update integ tests for Exp Plus M1 changes (aws#741) Co-authored-by: Dewen Qi <[email protected]> * add metrics client to session object (aws#745) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * change: Add integ test for using Run in Transform Job (aws#749) Co-authored-by: Dewen Qi <[email protected]> * Add async metrics sink (aws#739) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * use metrics client provided by session (aws#754) * fix flaky metrics test (aws#753) * change: Change Run.init and Run.load to constructor and module method respectively (aws#752) Co-authored-by: Dewen Qi <[email protected]> * feature: Add latest metric service model (aws#757) Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: qidewenwhen <[email protected]> * fix: lowercase run name (aws#767) * Change: Minimize use of lower case tc name (aws#769) * change: Clean up test resources to remove model files (aws#756) * change: Clean up test resources to remove model files * fix: Change experiment enums to upper case * change: Upgrade boto3 and update test to validate mixed case name * fix: Update as per latest botocore release and backend change Co-authored-by: Dewen Qi <[email protected]> * lowercase trial component name (aws#776) * change: Expose sagemaker experiment doc strings * fix: Fix exp name mixed case in issue Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Dana Benson <[email protected]> Co-authored-by: Yifei Zhu <[email protected]>

icywang86rui requested a review from laurenyu March 29, 2019 04:42

Add analytics integ test to the TensorFlow script mode minist test

e838cb4

laurenyu reviewed Mar 29, 2019

View reviewed changes

Merge branch 'master' into fix-analytics

6572452

icywang86rui added 2 commits April 1, 2019 11:11

Minor changes due to PR comments and to make flake8 happy

fa79676

Merge branch 'master' into fix-analytics

520c48b

laurenyu reviewed Apr 1, 2019

View reviewed changes

icywang86rui added 2 commits April 1, 2019 20:50

More minor changes

896381e

Merge branch 'master' into fix-analytics

b88df7b

icywang86rui changed the title ~~Make start time, end time and period configurable in analytics.Traini…~~ Make start time, end time and period configurable in sagemaker.analytics.TrainingJobAnalytics Apr 2, 2019

One more minor change

c46a1e3

laurenyu changed the title ~~Make start time, end time and period configurable in sagemaker.analytics.TrainingJobAnalytics~~ change: make start time, end time and period configurable in sagemaker.analytics.TrainingJobAnalytics Apr 2, 2019

laurenyu approved these changes Apr 2, 2019

View reviewed changes

Merge branch 'master' into fix-analytics

bf608a6

icywang86rui merged commit b15c05a into aws:master Apr 2, 2019

qidewenwhen added a commit to qidewenwhen/sagemaker-python-sdk that referenced this pull request Dec 13, 2022

fix: Fix run name uniqueness issue (aws#730)

0cf6d1d

Co-authored-by: Dewen Qi <[email protected]>

qidewenwhen added a commit to qidewenwhen/sagemaker-python-sdk that referenced this pull request Dec 14, 2022

fix: Fix run name uniqueness issue (aws#730)

908f27d

Co-authored-by: Dewen Qi <[email protected]>

qidewenwhen added a commit to qidewenwhen/sagemaker-python-sdk that referenced this pull request Dec 14, 2022

fix: Fix run name uniqueness issue (aws#730)

f0b784a

Co-authored-by: Dewen Qi <[email protected]>

qidewenwhen added a commit to qidewenwhen/sagemaker-python-sdk that referenced this pull request Dec 14, 2022

fix: Fix run name uniqueness issue (aws#730)

176b7b0

Co-authored-by: Dewen Qi <[email protected]>

qidewenwhen added a commit to qidewenwhen/sagemaker-python-sdk that referenced this pull request Dec 14, 2022

fix: Fix run name uniqueness issue (aws#730)

02a37a8

Co-authored-by: Dewen Qi <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

change: make start time, end time and period configurable in sagemaker.analytics.TrainingJobAnalytics #730

change: make start time, end time and period configurable in sagemaker.analytics.TrainingJobAnalytics #730

icywang86rui commented Mar 29, 2019

jesterhazy commented Mar 29, 2019

jesterhazy commented Mar 29, 2019

jesterhazy commented Mar 29, 2019

laurenyu left a comment

laurenyu Mar 29, 2019

laurenyu Mar 29, 2019

jesterhazy commented Apr 1, 2019

jesterhazy commented Apr 1, 2019

jesterhazy commented Apr 1, 2019

laurenyu Apr 1, 2019

laurenyu Apr 1, 2019

icywang86rui Apr 2, 2019

laurenyu Apr 2, 2019

icywang86rui Apr 2, 2019

jesterhazy commented Apr 2, 2019

laurenyu commented Apr 2, 2019

laurenyu left a comment

jesterhazy commented Apr 2, 2019

change: make start time, end time and period configurable in sagemaker.analytics.TrainingJobAnalytics #730

change: make start time, end time and period configurable in sagemaker.analytics.TrainingJobAnalytics #730

Conversation

icywang86rui commented Mar 29, 2019

Merge Checklist

jesterhazy commented Mar 29, 2019

AWS CodeBuild CI Report

jesterhazy commented Mar 29, 2019

AWS CodeBuild CI Report

jesterhazy commented Mar 29, 2019

AWS CodeBuild CI Report

laurenyu left a comment

Choose a reason for hiding this comment

laurenyu Mar 29, 2019

Choose a reason for hiding this comment

laurenyu Mar 29, 2019

Choose a reason for hiding this comment

jesterhazy commented Apr 1, 2019

AWS CodeBuild CI Report

jesterhazy commented Apr 1, 2019

AWS CodeBuild CI Report

jesterhazy commented Apr 1, 2019

AWS CodeBuild CI Report

laurenyu Apr 1, 2019

Choose a reason for hiding this comment

laurenyu Apr 1, 2019

Choose a reason for hiding this comment

icywang86rui Apr 2, 2019

Choose a reason for hiding this comment

laurenyu Apr 2, 2019

Choose a reason for hiding this comment

icywang86rui Apr 2, 2019

Choose a reason for hiding this comment

jesterhazy commented Apr 2, 2019

AWS CodeBuild CI Report

laurenyu commented Apr 2, 2019

laurenyu left a comment

Choose a reason for hiding this comment

jesterhazy commented Apr 2, 2019

AWS CodeBuild CI Report