-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feature: Estimator.fit like logs for transformer #782
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
src/sagemaker/session.py
Outdated
wait (bool): Whether to keep looking for new log entries until the job completes (default: False). | ||
poll (int): The interval in seconds between polling for new log entries and job completion (default: 5). | ||
Raises: | ||
ValueError: If waiting and the transform job fails. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd remove "waiting and"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I referred the annotation of logs_for_job
. Since there isn't much difference in their logic, let me know if its annotation should be updated as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good call - yeah, I'd update both spots
src/sagemaker/session.py
Outdated
instance_count = description['TransformResources']['InstanceCount'] | ||
status = description['TransformJobStatus'] | ||
|
||
stream_names = [] # The list of log streams |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there enough common logic between this method and logs_for_job
that could be refactored into a helper method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored the two methods...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with @laurenyu here. It seems that you can refactor this even more, only the describe job call seems to be different here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of this function still is a duplication of logs_for_job
. Let's refactored the common logic in a private method.
src/sagemaker/transformer.py
Outdated
@@ -79,7 +79,7 @@ def __init__(self, model_name, instance_count, instance_type, strategy=None, ass | |||
self.sagemaker_session = sagemaker_session or Session() | |||
|
|||
def transform(self, data, data_type='S3Prefix', content_type=None, compression_type=None, split_type=None, | |||
job_name=None): | |||
wait=True, logs=True, job_name=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generally we try to add new args to the end of the list in case anyone is calling the method by relying on the order of the arguments (e.g. Python lets you call transform(data, 'S3Prefix', 'text/csv')
without specifying the kwarg name)
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, but with one concern around if this would be too big of a change in behavior with making transform() blocking
src/sagemaker/transformer.py
Outdated
@@ -79,7 +79,7 @@ def __init__(self, model_name, instance_count, instance_type, strategy=None, ass | |||
self.sagemaker_session = sagemaker_session or Session() | |||
|
|||
def transform(self, data, data_type='S3Prefix', content_type=None, compression_type=None, split_type=None, | |||
job_name=None): | |||
job_name=None, wait=True, logs=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might constitute a breaking change since this method used to not be blocking, but I do like mimicking the behavior of fit()...
@mvsusp for a second opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like mimicking the behavior of fit as well. IMO it is okay to add wait
and logs
with default values if these defaults keep the original behaviour, i.e. wait=False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need do add an integration test with wait=true
as well.
src/sagemaker/transformer.py
Outdated
@@ -79,7 +79,7 @@ def __init__(self, model_name, instance_count, instance_type, strategy=None, ass | |||
self.sagemaker_session = sagemaker_session or Session() | |||
|
|||
def transform(self, data, data_type='S3Prefix', content_type=None, compression_type=None, split_type=None, | |||
job_name=None): | |||
job_name=None, wait=True, logs=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like mimicking the behavior of fit as well. IMO it is okay to add wait
and logs
with default values if these defaults keep the original behaviour, i.e. wait=False
.
src/sagemaker/session.py
Outdated
instance_count = description['TransformResources']['InstanceCount'] | ||
status = description['TransformJobStatus'] | ||
|
||
stream_names = [] # The list of log streams |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with @laurenyu here. It seems that you can refactor this even more, only the describe job call seems to be different here.
src/sagemaker/session.py
Outdated
print() | ||
# Customers are not billed for hardware provisioning, so billable time is less than total time | ||
billable_time = (description['TransformEndTime'] - description['TransformStartTime']) * instance_count | ||
print('Billable seconds:', int(billable_time.total_seconds()) + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you should calculate the billable seconds of the transform job here, I am unsure if it is calculated that way as well.
src/sagemaker/transformer.py
Outdated
@@ -79,7 +79,7 @@ def __init__(self, model_name, instance_count, instance_type, strategy=None, ass | |||
self.sagemaker_session = sagemaker_session or Session() | |||
|
|||
def transform(self, data, data_type='S3Prefix', content_type=None, compression_type=None, split_type=None, | |||
job_name=None): | |||
job_name=None, wait=True, logs=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need do add an integration test with wait=true
as well.
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Hey @mvsusp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the common logic between logs_for_transform_job
and logs_for_job
we are good to go!
src/sagemaker/session.py
Outdated
instance_count = description['TransformResources']['InstanceCount'] | ||
status = description['TransformJobStatus'] | ||
|
||
stream_names = [] # The list of log streams |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of this function still is a duplication of logs_for_job
. Let's refactored the common logic in a private method.
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Hi @mvsusp |
Please resolve the conflicts and we will take another look. |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
thanks for the contribution! |
Issue #, if available:
Description of changes:
Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.