-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feature: Data Serializer #2956
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: Data Serializer #2956
Conversation
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Codecov Report
@@ Coverage Diff @@
## dev #2956 +/- ##
==========================================
- Coverage 89.82% 89.80% -0.02%
==========================================
Files 196 196
Lines 16548 16563 +15
==========================================
+ Hits 14864 14875 +11
- Misses 1684 1688 +4
Continue to review full report at Codecov.
|
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
src/sagemaker/serializers.py
Outdated
if isinstance(data, str): | ||
if not os.path.exists(data): | ||
raise ValueError(f"{data} is not a valid file path.") | ||
image = open(data, "rb") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please close file here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add Exception handling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in current revision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please close file before returning from function
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
src/sagemaker/serializers.py
Outdated
try: | ||
dataFile = open(data, "rb") | ||
except Exception: | ||
raise ValueError(f"{data} is not a valid file-path.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets generalise the exception and not conclude not a valid file-path
for better debugging.
try:
dataFile = open(data, "rb")
dataFileInfo = dataFile.read()
dataFile.close()
except Exception as e:
raise ValueError(f"Could not open/read file: {data}. {e.message}")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated in the current revision.
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
src/sagemaker/serializers.py
Outdated
dataFile.close() | ||
except Exception as e: | ||
raise ValueError(f"Could not open/read file: {data}. {e}") | ||
return dataFileInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please move this in try block
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
dataFile = open(data, "rb") | ||
dataFileInfo = dataFile.read() | ||
dataFile.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it have made more sense to use the context manager for reading the file, to be safer about error handling when .read()
fails?
Also shouldn't be the naming schema for variables be snake_case
rather than camelCase
? Similar to all other Serializer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this PR is already merged, added a new PR with the suggested changes here: #2962
* change: update code to get commit_id in codepipeline (#2961) * feature: Data Serializer (#2956) * change: reorganize test files for workflow (#2960) Co-authored-by: Ben Crabtree <[email protected]> Co-authored-by: Navin Soni <[email protected]> Co-authored-by: Jeniya Tabassum <[email protected]> Co-authored-by: Dewen Qi <[email protected]> * feature: TensorFlow 2.4 for Neo (#2861) Co-authored-by: Ben Crabtree <[email protected]> Co-authored-by: Navin Soni <[email protected]> Co-authored-by: Jeniya Tabassum <[email protected]> * fix: Remove sagemaker_job_name from hyperparameters in TrainingStep (#2950) Co-authored-by: Payton Staub <[email protected]> * fix: Style update in DataSerializer (#2962) * documentation: smddp doc update (#2968) * fix: container env generation for S3 URI and add test for the same (#2971) * documentation: update sagemaker training compiler docstring (#2969) * feat: Python 3.9 for readthedocs (#2973) Co-authored-by: Ben Crabtree <[email protected]> Co-authored-by: Navin Soni <[email protected]> Co-authored-by: Jeniya Tabassum <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: qidewenwhen <[email protected]> Co-authored-by: Qingzi-Lan <[email protected]> Co-authored-by: Payton Staub <[email protected]> * fix doc structure * archive 1.6.0 doc * add new args, refs, and links * fix version number * incorp eng feedback, update docstrings, improve xref * Trigger Build * minor fix, trigger build again * fix typo Co-authored-by: Navin Soni <[email protected]> Co-authored-by: Jeniya Tabassum <[email protected]> Co-authored-by: qidewenwhen <[email protected]> Co-authored-by: Ben Crabtree <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Qingzi-Lan <[email protected]> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Shreya Pandit <[email protected]> Co-authored-by: Ahsan Khan <[email protected]> Co-authored-by: Mufaddal Rohawala <[email protected]>
* change: update code to get commit_id in codepipeline (aws#2961) * feature: Data Serializer (aws#2956) * change: reorganize test files for workflow (aws#2960) Co-authored-by: Ben Crabtree <[email protected]> Co-authored-by: Navin Soni <[email protected]> Co-authored-by: Jeniya Tabassum <[email protected]> Co-authored-by: Dewen Qi <[email protected]> * feature: TensorFlow 2.4 for Neo (aws#2861) Co-authored-by: Ben Crabtree <[email protected]> Co-authored-by: Navin Soni <[email protected]> Co-authored-by: Jeniya Tabassum <[email protected]> * fix: Remove sagemaker_job_name from hyperparameters in TrainingStep (aws#2950) Co-authored-by: Payton Staub <[email protected]> * fix: Style update in DataSerializer (aws#2962) * documentation: smddp doc update (aws#2968) * fix: container env generation for S3 URI and add test for the same (aws#2971) * documentation: update sagemaker training compiler docstring (aws#2969) * feat: Python 3.9 for readthedocs (aws#2973) Co-authored-by: Ben Crabtree <[email protected]> Co-authored-by: Navin Soni <[email protected]> Co-authored-by: Jeniya Tabassum <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: qidewenwhen <[email protected]> Co-authored-by: Qingzi-Lan <[email protected]> Co-authored-by: Payton Staub <[email protected]> * fix doc structure * archive 1.6.0 doc * add new args, refs, and links * fix version number * incorp eng feedback, update docstrings, improve xref * Trigger Build * minor fix, trigger build again * fix typo Co-authored-by: Navin Soni <[email protected]> Co-authored-by: Jeniya Tabassum <[email protected]> Co-authored-by: qidewenwhen <[email protected]> Co-authored-by: Ben Crabtree <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Qingzi-Lan <[email protected]> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Shreya Pandit <[email protected]> Co-authored-by: Ahsan Khan <[email protected]> Co-authored-by: Mufaddal Rohawala <[email protected]>
* change: update code to get commit_id in codepipeline (aws#2961) * feature: Data Serializer (aws#2956) * change: reorganize test files for workflow (aws#2960) Co-authored-by: Ben Crabtree <[email protected]> Co-authored-by: Navin Soni <[email protected]> Co-authored-by: Jeniya Tabassum <[email protected]> Co-authored-by: Dewen Qi <[email protected]> * feature: TensorFlow 2.4 for Neo (aws#2861) Co-authored-by: Ben Crabtree <[email protected]> Co-authored-by: Navin Soni <[email protected]> Co-authored-by: Jeniya Tabassum <[email protected]> * fix: Remove sagemaker_job_name from hyperparameters in TrainingStep (aws#2950) Co-authored-by: Payton Staub <[email protected]> * fix: Style update in DataSerializer (aws#2962) * documentation: smddp doc update (aws#2968) * fix: container env generation for S3 URI and add test for the same (aws#2971) * documentation: update sagemaker training compiler docstring (aws#2969) * feat: Python 3.9 for readthedocs (aws#2973) Co-authored-by: Ben Crabtree <[email protected]> Co-authored-by: Navin Soni <[email protected]> Co-authored-by: Jeniya Tabassum <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: qidewenwhen <[email protected]> Co-authored-by: Qingzi-Lan <[email protected]> Co-authored-by: Payton Staub <[email protected]> * fix doc structure * archive 1.6.0 doc * add new args, refs, and links * fix version number * incorp eng feedback, update docstrings, improve xref * Trigger Build * minor fix, trigger build again * fix typo Co-authored-by: Navin Soni <[email protected]> Co-authored-by: Jeniya Tabassum <[email protected]> Co-authored-by: qidewenwhen <[email protected]> Co-authored-by: Ben Crabtree <[email protected]> Co-authored-by: Dewen Qi <[email protected]> Co-authored-by: Qingzi-Lan <[email protected]> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Payton Staub <[email protected]> Co-authored-by: Shreya Pandit <[email protected]> Co-authored-by: Ahsan Khan <[email protected]> Co-authored-by: Mufaddal Rohawala <[email protected]>
Issue #, if available:
Includes a serializer for multimodal support
Description of changes:
Introduced DataSerializer class that utilizes current SimpleBaseSerializer class to read and serialize data in different formats, i.e, audio/image
Testing done:
Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.General
Tests
unique_name_from_base
to create resource names in integ tests (if appropriate)By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.