Skip to content

feature: Data Serializer #2956

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Feb 28, 2022
Merged

feature: Data Serializer #2956

merged 9 commits into from
Feb 28, 2022

Conversation

jeniyat
Copy link
Contributor

@jeniyat jeniyat commented Feb 23, 2022

Issue #, if available:
Includes a serializer for multimodal support

Description of changes:
Introduced DataSerializer class that utilizes current SimpleBaseSerializer class to read and serialize data in different formats, i.e, audio/image

Testing done:

  • Tested the data serializer with different inputs.
  • Tested the data serializer with the default content_type="file-path/raw-bytes"
  • Tested the changes in this PR locally, where it passed all the tests (attached the local output below)
JT:sagemaker-python-sdk jeniyat$ ./.githooks/pre-push
GLOB sdist-make: /Users/jeniyat/Desktop/HuggingFace/source_repo/sagemaker-python-sdk/setup.py
✔ OK black-check in 8.605 seconds
✔ OK twine in 11.449 seconds
✔ OK pylint in 21.92 seconds
✔ OK docstyle in 24.015 seconds
✔ OK flake8 in 41.267 seconds
__________________________________________________________________________________________________________________________________________________________ summary ___________________________________________________________________________________________________________________________________________________________
  flake8: commands succeeded
  pylint: commands succeeded
  docstyle: commands succeeded
  black-check: commands succeeded
  twine: commands succeeded
  congratulations :)
=================== flake8,pylint,docstyle,black-check,twine execution time ===================
44 seconds

GLOB sdist-make: /Users/jeniyat/Desktop/HuggingFace/source_repo/sagemaker-python-sdk/setup.py
✔ OK doc8 in 8.755 seconds
✔ OK sphinx in 4 minutes, 46.133 seconds
__________________________________________________________________________________________________________________________________________________________ summary ___________________________________________________________________________________________________________________________________________________________
  sphinx: commands succeeded
  doc8: commands succeeded
  congratulations :)
=================== sphinx,doc8 execution time ===================
4 minutes and 49 seconds


Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

General

  • I have read the CONTRIBUTING doc
  • I certify that the changes I am introducing will be backword compatible, and I have discussed concerns about this, if any, with the Python SDK team
  • I used the commit message format described in CONTRIBUTING
  • I have passed the region in to all S3 and STS clients that I've initialized as part of this change.
  • I have updated any necessary documentation, including READMEs and API docs (if appropriate)

Tests

  • I have added tests that prove my fix is effective or that my feature works (if appropriate)
  • I have added unit and/or integration tests as appropriate to ensure backward compatibility of the changes
  • I have tested the tests locally.
  • I have checked that my tests are not configured for a specific region or account (if appropriate)
  • I have used unique_name_from_base to create resource names in integ tests (if appropriate)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: ea7e9a9
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@codecov-commenter
Copy link

codecov-commenter commented Feb 23, 2022

Codecov Report

Merging #2956 (fe0bf46) into dev (4ce6623) will decrease coverage by 0.01%.
The diff coverage is 80.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##              dev    #2956      +/-   ##
==========================================
- Coverage   89.82%   89.80%   -0.02%     
==========================================
  Files         196      196              
  Lines       16548    16563      +15     
==========================================
+ Hits        14864    14875      +11     
- Misses       1684     1688       +4     
Impacted Files Coverage Δ
src/sagemaker/serializers.py 92.96% <80.00%> (-1.73%) ⬇️
src/sagemaker/session.py 70.39% <0.00%> (-0.08%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4ce6623...fe0bf46. Read the comment docs.

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: ea7e9a9
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: ea7e9a9
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

if isinstance(data, str):
if not os.path.exists(data):
raise ValueError(f"{data} is not a valid file path.")
image = open(data, "rb")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please close file here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add Exception handling

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in current revision.

Copy link
Contributor

@navinsoni navinsoni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please close file before returning from function

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: ecb4800
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 1cb9de6
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: ecb4800
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 1cb9de6
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 1cb9de6
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: ecb4800
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-notebook-tests
  • Commit ID: 1cb9de6
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: d79af25
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

try:
dataFile = open(data, "rb")
except Exception:
raise ValueError(f"{data} is not a valid file-path.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets generalise the exception and not conclude not a valid file-path for better debugging.

try:
	dataFile = open(data, "rb")
	dataFileInfo = dataFile.read()
	dataFile.close()
except Exception as e:
         raise ValueError(f"Could not open/read file: {data}. {e.message}")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated in the current revision.

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: d79af25
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: d79af25
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 7e4893c
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: 8368584
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 8368584
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: 7e4893c
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 8368584
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: 7e4893c
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

dataFile.close()
except Exception as e:
raise ValueError(f"Could not open/read file: {data}. {e}")
return dataFileInfo
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please move this in try block

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: bde9af2
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: bde9af2
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: bde9af2
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-unit-tests
  • Commit ID: fe0bf46
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: fe0bf46
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-slow-tests
  • Commit ID: fe0bf46
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-python-sdk-local-mode-tests
  • Commit ID: fe0bf46
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@jeniyat jeniyat merged commit 5c64e6c into aws:dev Feb 28, 2022
@jeniyat jeniyat deleted the jeniyat/data-serializer branch February 28, 2022 16:35
Comment on lines +384 to +386
dataFile = open(data, "rb")
dataFileInfo = dataFile.read()
dataFile.close()
Copy link
Contributor

@philschmid philschmid Mar 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it have made more sense to use the context manager for reading the file, to be safer about error handling when .read() fails?
Also shouldn't be the naming schema for variables be snake_case rather than camelCase? Similar to all other Serializer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this PR is already merged, added a new PR with the suggested changes here: #2962

shreyapandit pushed a commit that referenced this pull request Mar 1, 2022
shreyapandit pushed a commit that referenced this pull request Mar 4, 2022
jeniyat added a commit that referenced this pull request Mar 18, 2022
* change: update code to get commit_id in codepipeline (#2961)

* feature: Data Serializer (#2956)

* change: reorganize test files for workflow (#2960)

Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Jeniya Tabassum <[email protected]>
Co-authored-by: Dewen Qi <[email protected]>

* feature: TensorFlow 2.4 for Neo (#2861)

Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Jeniya Tabassum <[email protected]>

* fix: Remove sagemaker_job_name from hyperparameters in TrainingStep (#2950)

Co-authored-by: Payton Staub <[email protected]>

* fix: Style update in DataSerializer (#2962)

* documentation: smddp doc update (#2968)

* fix: container env generation for S3 URI and add test for the same (#2971)

* documentation: update sagemaker training compiler docstring (#2969)

* feat: Python 3.9 for readthedocs (#2973)

Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Jeniya Tabassum <[email protected]>
Co-authored-by: Dewen Qi <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: qidewenwhen <[email protected]>
Co-authored-by: Qingzi-Lan <[email protected]>
Co-authored-by: Payton Staub <[email protected]>

* fix doc structure

* archive 1.6.0 doc

* add new args, refs, and links

* fix version number

* incorp eng feedback, update docstrings, improve xref

* Trigger Build

* minor fix, trigger build again

* fix typo

Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Jeniya Tabassum <[email protected]>
Co-authored-by: qidewenwhen <[email protected]>
Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Dewen Qi <[email protected]>
Co-authored-by: Qingzi-Lan <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Shreya Pandit <[email protected]>
Co-authored-by: Ahsan Khan <[email protected]>
Co-authored-by: Mufaddal Rohawala <[email protected]>
ahsan-z-khan added a commit to ahsan-z-khan/sagemaker-python-sdk that referenced this pull request Mar 23, 2022
* change: update code to get commit_id in codepipeline (aws#2961)

* feature: Data Serializer (aws#2956)

* change: reorganize test files for workflow (aws#2960)

Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Jeniya Tabassum <[email protected]>
Co-authored-by: Dewen Qi <[email protected]>

* feature: TensorFlow 2.4 for Neo (aws#2861)

Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Jeniya Tabassum <[email protected]>

* fix: Remove sagemaker_job_name from hyperparameters in TrainingStep (aws#2950)

Co-authored-by: Payton Staub <[email protected]>

* fix: Style update in DataSerializer (aws#2962)

* documentation: smddp doc update (aws#2968)

* fix: container env generation for S3 URI and add test for the same (aws#2971)

* documentation: update sagemaker training compiler docstring (aws#2969)

* feat: Python 3.9 for readthedocs (aws#2973)

Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Jeniya Tabassum <[email protected]>
Co-authored-by: Dewen Qi <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: qidewenwhen <[email protected]>
Co-authored-by: Qingzi-Lan <[email protected]>
Co-authored-by: Payton Staub <[email protected]>

* fix doc structure

* archive 1.6.0 doc

* add new args, refs, and links

* fix version number

* incorp eng feedback, update docstrings, improve xref

* Trigger Build

* minor fix, trigger build again

* fix typo

Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Jeniya Tabassum <[email protected]>
Co-authored-by: qidewenwhen <[email protected]>
Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Dewen Qi <[email protected]>
Co-authored-by: Qingzi-Lan <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Shreya Pandit <[email protected]>
Co-authored-by: Ahsan Khan <[email protected]>
Co-authored-by: Mufaddal Rohawala <[email protected]>
jerrypeng7773 pushed a commit to jerrypeng7773/sagemaker-python-sdk that referenced this pull request Apr 19, 2022
jerrypeng7773 pushed a commit to jerrypeng7773/sagemaker-python-sdk that referenced this pull request May 13, 2022
* change: update code to get commit_id in codepipeline (aws#2961)

* feature: Data Serializer (aws#2956)

* change: reorganize test files for workflow (aws#2960)

Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Jeniya Tabassum <[email protected]>
Co-authored-by: Dewen Qi <[email protected]>

* feature: TensorFlow 2.4 for Neo (aws#2861)

Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Jeniya Tabassum <[email protected]>

* fix: Remove sagemaker_job_name from hyperparameters in TrainingStep (aws#2950)

Co-authored-by: Payton Staub <[email protected]>

* fix: Style update in DataSerializer (aws#2962)

* documentation: smddp doc update (aws#2968)

* fix: container env generation for S3 URI and add test for the same (aws#2971)

* documentation: update sagemaker training compiler docstring (aws#2969)

* feat: Python 3.9 for readthedocs (aws#2973)

Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Jeniya Tabassum <[email protected]>
Co-authored-by: Dewen Qi <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: qidewenwhen <[email protected]>
Co-authored-by: Qingzi-Lan <[email protected]>
Co-authored-by: Payton Staub <[email protected]>

* fix doc structure

* archive 1.6.0 doc

* add new args, refs, and links

* fix version number

* incorp eng feedback, update docstrings, improve xref

* Trigger Build

* minor fix, trigger build again

* fix typo

Co-authored-by: Navin Soni <[email protected]>
Co-authored-by: Jeniya Tabassum <[email protected]>
Co-authored-by: qidewenwhen <[email protected]>
Co-authored-by: Ben Crabtree <[email protected]>
Co-authored-by: Dewen Qi <[email protected]>
Co-authored-by: Qingzi-Lan <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Payton Staub <[email protected]>
Co-authored-by: Shreya Pandit <[email protected]>
Co-authored-by: Ahsan Khan <[email protected]>
Co-authored-by: Mufaddal Rohawala <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants