Skip to content

feature: Estimator.fit like logs for transformer #782

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Sep 6, 2019

Conversation

imujjwal96
Copy link
Contributor

@imujjwal96 imujjwal96 commented May 7, 2019

Issue #, if available:

Description of changes:

  • Estimator.fit has a logs flag attribute but that is not the same for Transformer. We need to create the same experience in the Transformer.

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

  • I have read the CONTRIBUTING doc
  • I used the commit message format described in CONTRIBUTING
  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have updated any necessary documentation (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@imujjwal96 imujjwal96 changed the title feature: Estimator.fit like logs for transformer [WIP] feature: Estimator.fit like logs for transformer May 7, 2019
@jesterhazy
Copy link
Contributor

AWS CodeBuild CI Report

  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jesterhazy
Copy link
Contributor

AWS CodeBuild CI Report

  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@imujjwal96 imujjwal96 changed the title [WIP] feature: Estimator.fit like logs for transformer feature: Estimator.fit like logs for transformer May 9, 2019
@jesterhazy
Copy link
Contributor

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

wait (bool): Whether to keep looking for new log entries until the job completes (default: False).
poll (int): The interval in seconds between polling for new log entries and job completion (default: 5).
Raises:
ValueError: If waiting and the transform job fails.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove "waiting and"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I referred the annotation of logs_for_job. Since there isn't much difference in their logic, let me know if its annotation should be updated as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good call - yeah, I'd update both spots

instance_count = description['TransformResources']['InstanceCount']
status = description['TransformJobStatus']

stream_names = [] # The list of log streams
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there enough common logic between this method and logs_for_job that could be refactored into a helper method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored the two methods...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with @laurenyu here. It seems that you can refactor this even more, only the describe job call seems to be different here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this function still is a duplication of logs_for_job. Let's refactored the common logic in a private method.

@@ -79,7 +79,7 @@ def __init__(self, model_name, instance_count, instance_type, strategy=None, ass
self.sagemaker_session = sagemaker_session or Session()

def transform(self, data, data_type='S3Prefix', content_type=None, compression_type=None, split_type=None,
job_name=None):
wait=True, logs=True, job_name=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally we try to add new args to the end of the list in case anyone is calling the method by relying on the order of the arguments (e.g. Python lets you call transform(data, 'S3Prefix', 'text/csv') without specifying the kwarg name)

@jesterhazy
Copy link
Contributor

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jesterhazy
Copy link
Contributor

AWS CodeBuild CI Report

  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Contributor

@laurenyu laurenyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but with one concern around if this would be too big of a change in behavior with making transform() blocking

@@ -79,7 +79,7 @@ def __init__(self, model_name, instance_count, instance_type, strategy=None, ass
self.sagemaker_session = sagemaker_session or Session()

def transform(self, data, data_type='S3Prefix', content_type=None, compression_type=None, split_type=None,
job_name=None):
job_name=None, wait=True, logs=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might constitute a breaking change since this method used to not be blocking, but I do like mimicking the behavior of fit()...

@mvsusp for a second opinion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like mimicking the behavior of fit as well. IMO it is okay to add wait and logs with default values if these defaults keep the original behaviour, i.e. wait=False.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need do add an integration test with wait=true as well.

@@ -79,7 +79,7 @@ def __init__(self, model_name, instance_count, instance_type, strategy=None, ass
self.sagemaker_session = sagemaker_session or Session()

def transform(self, data, data_type='S3Prefix', content_type=None, compression_type=None, split_type=None,
job_name=None):
job_name=None, wait=True, logs=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like mimicking the behavior of fit as well. IMO it is okay to add wait and logs with default values if these defaults keep the original behaviour, i.e. wait=False.

instance_count = description['TransformResources']['InstanceCount']
status = description['TransformJobStatus']

stream_names = [] # The list of log streams
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed with @laurenyu here. It seems that you can refactor this even more, only the describe job call seems to be different here.

print()
# Customers are not billed for hardware provisioning, so billable time is less than total time
billable_time = (description['TransformEndTime'] - description['TransformStartTime']) * instance_count
print('Billable seconds:', int(billable_time.total_seconds()) + 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you should calculate the billable seconds of the transform job here, I am unsure if it is calculated that way as well.

@@ -79,7 +79,7 @@ def __init__(self, model_name, instance_count, instance_type, strategy=None, ass
self.sagemaker_session = sagemaker_session or Session()

def transform(self, data, data_type='S3Prefix', content_type=None, compression_type=None, split_type=None,
job_name=None):
job_name=None, wait=True, logs=True):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need do add an integration test with wait=true as well.

@jesterhazy
Copy link
Contributor

AWS CodeBuild CI Report

  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jesterhazy
Copy link
Contributor

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@imujjwal96
Copy link
Contributor Author

Hey @mvsusp
I've made the requested changes. Please review them.

Copy link
Contributor

@mvsusp mvsusp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the common logic between logs_for_transform_job and logs_for_job we are good to go!

instance_count = description['TransformResources']['InstanceCount']
status = description['TransformJobStatus']

stream_names = [] # The list of log streams
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this function still is a duplication of logs_for_job. Let's refactored the common logic in a private method.

@jesterhazy
Copy link
Contributor

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@imujjwal96
Copy link
Contributor Author

Hi @mvsusp
I had done some refactoring. Can you please review it and see if it makes sense?

@icywang86rui
Copy link
Contributor

Please resolve the conflicts and we will take another look.

@jesterhazy
Copy link
Contributor

AWS CodeBuild CI Report

  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jesterhazy
Copy link
Contributor

AWS CodeBuild CI Report

  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jesterhazy
Copy link
Contributor

AWS CodeBuild CI Report

  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jesterhazy
Copy link
Contributor

AWS CodeBuild CI Report

  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@laurenyu laurenyu merged commit c364fd1 into aws:master Sep 6, 2019
@laurenyu
Copy link
Contributor

laurenyu commented Sep 6, 2019

thanks for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants