-
Notifications
You must be signed in to change notification settings - Fork 1.2k
fix: Return ARM XGB/SKLearn tags if image_scope
is inference_graviton
#3449
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ab2b4f4
to
5bdd3a5
Compare
src/sagemaker/image_uris.py
Outdated
and instance_type_family in GRAVITON_ALLOWED_TARGET_INSTANCE_FAMILY | ||
if framework in (XGBOOST_FRAMEWORK, SKLEARN_FRAMEWORK) and ( | ||
instance_type_family in GRAVITON_ALLOWED_TARGET_INSTANCE_FAMILY | ||
or final_image_scope == INFERENCE_GRAVITON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to leave the "and" statement to make sure that the customer don't accidentally specify a non-Graviton instance(with the scope flag) and get a cryptic error message? We can instead an added condition to check if the scope param was passed and if it is set to INFERENCE_GRAVITON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good point. I think what we should do then is:
- If
image_scope == "inference_graviton"
andinstance_type
isNone
or Graviton instance -> return ARM image - If
image_scope == "inference_graviton"
andinstance_type
is x86 instance -> raise ValueError
Let me know if this is what you had in mind and if there are any other conditions to catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think image_scope
should not be the unique identifier for the Customers intentions to invoke Graviton instances. Instead can we have a check for instance_type
when belonging to GRAVITON_ALLOWED_TARGET_INSTANCE_FAMILY
then invoke the graviton instance and set the scope to INFERENCE_GRAVITON
?
This is how it was implemented for PyTorch and Tensorflow frameworks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can circle back and discuss the implementation. With the existing implementation it is confusing for Customer to use these instances differently for each framework type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The instance type check that updates the image scope to Graviton inference is already in place for XGB and SKLearn in the same way it is for PyTorch and TensorFlow and this PR doesn't change that behavior.
The rational for this change is that our examples in public documentation (not just for XGB and SKLearn) specifies image_scope
when it isn't strictly necessary to get the image URI. Search "image_scope" on this page: https://docs.aws.amazon.com/sagemaker/latest/dg/ecr-us-west-2.html#sklearn-us-west-2.title Various example notebooks also explicitly specify image_scope
as a parameter.
Specifying inference_graviton
as the image scope and not getting back the graviton image even if the instance type isn't explicitly specified seems like an inconsistency to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, this is the behavior in this PR. My understanding is that the first two cases are exactly how TF and PT behave as well.
# Graviton instance type specified, ARM image returned
In [1]: sagemaker.image_uris.retrieve("xgboost", "us-west-2", version="1.5-1", instance_type="ml.m6g.xlarge")
Out[1]: '246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.5-1-arm64'
In [2]: sagemaker.image_uris.retrieve("xgboost", "us-west-2", version="1.5-1", instance_type="ml.m6g.xlarge", image_scope="inference")
Out[2]: '246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.5-1-arm64'
In [3]: sagemaker.image_uris.retrieve("xgboost", "us-west-2", version="1.5-1", instance_type="ml.m6g.xlarge", image_scope="graviton_inference")
Out[3]: '246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.5-1-arm64'
# This is new in this PR
In [4]: sagemaker.image_uris.retrieve("xgboost", "us-west-2", version="1.5-1", image_scope="inference_graviton")
Out[4]: '246618743249.dkr.ecr.us-west-2.amazonaws.com/sagemaker-xgboost:1.5-1-arm64'
In [5]: sagemaker.image_uris.retrieve("xgboost", "us-west-2", version="1.5-1", instance_type="ml.m5.xlarge", image_scope="inference_graviton")
ValueError: Unsupported instance type: m5. You may need to upgrade your SDK version (pip install -U sagemaker) for newer instance types. Supported instance type(s): m6g, m6gd, c6g, c6gd, c6gn, c7g, r6g, r6gd.
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Codecov Report
@@ Coverage Diff @@
## master #3449 +/- ##
==========================================
- Coverage 89.17% 88.96% -0.21%
==========================================
Files 204 212 +8
Lines 18979 20441 +1462
==========================================
+ Hits 16924 18186 +1262
- Misses 2055 2255 +200
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
5bdd3a5
to
a839f9f
Compare
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
a839f9f
to
66d208e
Compare
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
Failed slow-test passed when I ran it locally with |
66d208e
to
32dbaf1
Compare
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
@@ -266,6 +269,7 @@ def _get_instance_type_family(instance_type): | |||
def _get_image_tag( | |||
container_version, | |||
distribution, | |||
final_image_scope, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also update the docs in line 79? I'll ping the SDK team on Slack about updating the public docs.
32dbaf1
to
f7af490
Compare
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/bot run pr, slow-tests
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
f7af490
to
7bfffc8
Compare
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
…ton` (aws#3449) Co-authored-by: Mark Bunday <[email protected]>
…ton` (aws#3449) Co-authored-by: Mark Bunday <[email protected]>
…ton` (aws#3449) Co-authored-by: Mark Bunday <[email protected]>
…ton` (aws#3449) Co-authored-by: Mark Bunday <[email protected]>
Issue #, if available:
Description of changes:
Return ARM XGB/SKLearn tags if
image_scope
isinference_graviton
.Old behavior:
New behavior:
Also, a validation error is thrown if
image_scope
is set toinference_graviton
with a non-arm instance.Valid:
Invalid:
Testing done:
Merge Checklist
Put an
x
in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.General
Tests
unique_name_from_base
to create resource names in integ tests (if appropriate)By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.