-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feature: Hugging Face Transformers 4.12 for Pt1.9/TF2.5 #2752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2752 +/- ##
=======================================
Coverage 88.71% 88.71%
=======================================
Files 167 167
Lines 14766 14766
=======================================
Hits 13099 13099
Misses 1667 1667 Continue to review full report at Codecov.
|
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
all releases are done: can we run the pipeline and then merge? |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good to me.
@@ -158,7 +158,7 @@ def test_huggingface_inference( | |||
huggingface_pytorch_latest_inference_py_version, | |||
): | |||
env = { | |||
"HF_MODEL_ID": "sshleifer/tiny-distilbert-base-uncased-finetuned-sst-2-english", | |||
"HF_MODEL_ID": "philschmid/tiny-distilbert-classification", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@philschmid Can we please give this a generic name to this model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what you mean by that. That is not a model which will be created it is stored on the hf.co/models hub and used to run tests. I changed it because to a model which we can control. https://huggingface.co/philschmid/tiny-distilbert-classification.
tests/data/huggingface/run_tf.py
Outdated
x: train_dataset[x].to_tensor(default_value=0, shape=[None, tokenizer.model_max_length]) | ||
for x in ["input_ids", "attention_mask"] | ||
} | ||
train_features = {x: train_dataset[x] for x in ["input_ids", "attention_mask"]} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you detail why we are removing the to_tensor call, especially since the shape's tokenizer.model_max_length
parameter is something that we gave explicitly in the past? Is this driven by the change in TF version?
Please make this backwards compatible ie. have an original test case with the previous changes, and add a new test case for this requirement where the to_tensor call is not required so that we test both scenarios.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been removed in Datasets
since we changed the internal structure. it used to return RaggedTensor even when the tensors were normal dense tensors.
And the tokenizer.mode_max_length
is already represented in
train_dataset = train_dataset.map(
lambda e: tokenizer(e["text"], truncation=True, padding="max_length"), batched=True
)
which creates already a shape of max_length
.
I added a condition to test to check the transformers
version and added the old code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please address Shreya's comments
856a949
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Can some restart the pipeline please? the error is if http.status_code >= 300:
error_code = parsed_response.get("Error", {}).get("Code")
error_class = self.exceptions.from_code(error_code)
> raise error_class(parsed_response, operation_name)
E botocore.errorfactory.ResourceLimitExceeded: An error occurred (ResourceLimitExceeded) when calling the CreateMonitoringSchedule operation: The account-level service limit 'ml.c5.xlarge for processing job usage' is 50 Instances, with current utilization of 50 Instances and a request delta of 1 Instances. Please contact AWS support to request an increase for this limit. |
Yes, re-running the tests. |
Update: Refactored code and created function to generate dataset feature set. |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Thank you for the reviews and support! Is there an ETA for the pypi release? |
* added new HuggingFace DLCs Co-authored-by: Navin Soni <[email protected]>
* added new HuggingFace DLCs Co-authored-by: Navin Soni <[email protected]>
Issue #, if available:
#2751
Description of changes:
Testing done:
Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.General
Tests
unique_name_from_base
to create resource names in integ tests (if appropriate)By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.