Skip to content

Commit b9d1485

Browse files
authored
Merge branch 'master' into patch-1
2 parents d3779dd + 19115a2 commit b9d1485

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+1753
-174
lines changed

CHANGELOG.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,36 @@
11
# Changelog
22

3+
## v2.145.0 (2023-04-06)
4+
5+
### Features
6+
7+
* add support for async inline error notifications
8+
* Add methods for feature group to list feature metadata parameters and tags
9+
* Support huggingface hub model_id for DJL Models
10+
11+
### Bug Fixes and Other Changes
12+
13+
* load_sagemaker_config should lazy initialize a default S3 resource
14+
15+
## v2.144.0 (2023-04-05)
16+
17+
### Features
18+
19+
* support create Clarify explainer enabled endpoint for Clarify Online Explainability
20+
* Combined inference and training script artifact
21+
* jumpstart instance types
22+
* Deprecation warning for framework profiling for TF 2.12 and on, PT 2.0 and on
23+
24+
### Bug Fixes and Other Changes
25+
26+
* always delete temporary directory even during exception
27+
* Fixes the completion_criteria_config dict in the to_input_req method
28+
* Update CHANGELOG.md
29+
30+
### Documentation Changes
31+
32+
* Update SageMaker Debugger doc
33+
334
## v2.143.0 (2023-03-29)
435

536
### Features

VERSION

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
2.143.1.dev0
1+
2.145.1.dev0

doc/api/inference/explainer.rst

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
Online Explainability
2+
---------------------
3+
4+
This module contains classes related to Amazon Sagemaker Clarify Online Explainability
5+
6+
.. automodule:: sagemaker.explainer.explainer_config
7+
:members:
8+
:undoc-members:
9+
:show-inheritance:
10+
11+
.. automodule:: sagemaker.explainer.clarify_explainer_config
12+
:members:
13+
:undoc-members:
14+
:show-inheritance:
15+
16+

doc/api/prep_data/feature_store.rst

Lines changed: 38 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Feature Store APIs
22
------------------
33

4-
Feature group
4+
Feature Group
55
*************
66

77
.. autoclass:: sagemaker.feature_store.feature_group.FeatureGroup
@@ -18,7 +18,7 @@ Feature group
1818
:show-inheritance:
1919

2020

21-
Feature definition
21+
Feature Definition
2222
******************
2323

2424
.. autoclass:: sagemaker.feature_store.feature_definition.FeatureDefinition
@@ -77,10 +77,46 @@ Inputs
7777
:members:
7878
:show-inheritance:
7979

80+
.. autoclass:: sagemaker.feature_store.inputs.ResourceEnum
81+
:members:
82+
:show-inheritance:
83+
84+
.. autoclass:: sagemaker.feature_store.inputs.SearchOperatorEnum
85+
:members:
86+
:show-inheritance:
87+
88+
.. autoclass:: sagemaker.feature_store.inputs.SortOrderEnum
89+
:members:
90+
:show-inheritance:
91+
92+
.. autoclass:: sagemaker.feature_store.inputs.FilterOperatorEnum
93+
:members:
94+
:show-inheritance:
95+
96+
.. autoclass:: sagemaker.feature_store.inputs.Filter
97+
:members:
98+
:show-inheritance:
99+
100+
.. autoclass:: sagemaker.feature_store.inputs.Identifier
101+
:members:
102+
:show-inheritance:
103+
104+
.. autoclass:: sagemaker.feature_store.inputs.FeatureParameter
105+
:members:
106+
:show-inheritance:
107+
80108

81109
Dataset Builder
82110
***************
83111

84112
.. autoclass:: sagemaker.feature_store.dataset_builder.DatasetBuilder
85113
:members:
86114
:show-inheritance:
115+
116+
117+
Feature Store
118+
*************
119+
120+
.. autoclass:: sagemaker.feature_store.feature_store.FeatureStore
121+
:members:
122+
:show-inheritance:

doc/frameworks/djl/using_djl.rst

Lines changed: 44 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ You can either deploy your model using DeepSpeed or HuggingFace Accelerate, or l
2929
3030
# Create a DJL Model, backend is chosen automatically
3131
djl_model = DJLModel(
32-
"s3://my_bucket/my_saved_model_artifacts/",
32+
"s3://my_bucket/my_saved_model_artifacts/", # This can also be a HuggingFace Hub model id
3333
"my_sagemaker_role",
3434
data_type="fp16",
3535
task="text-generation",
@@ -46,7 +46,7 @@ If you want to use a specific backend, then you can create an instance of the co
4646
4747
# Create a model using the DeepSpeed backend
4848
deepspeed_model = DeepSpeedModel(
49-
"s3://my_bucket/my_saved_model_artifacts/",
49+
"s3://my_bucket/my_saved_model_artifacts/", # This can also be a HuggingFace Hub model id
5050
"my_sagemaker_role",
5151
data_type="bf16",
5252
task="text-generation",
@@ -56,7 +56,7 @@ If you want to use a specific backend, then you can create an instance of the co
5656
# Create a model using the HuggingFace Accelerate backend
5757
5858
hf_accelerate_model = HuggingFaceAccelerateModel(
59-
"s3://my_bucket/my_saved_model_artifacts/",
59+
"s3://my_bucket/my_saved_model_artifacts/", # This can also be a HuggingFace Hub model id
6060
"my_sagemaker_role",
6161
data_type="fp16",
6262
task="text-generation",
@@ -91,9 +91,37 @@ model server configuration.
9191
Model Artifacts
9292
---------------
9393

94+
DJL Serving supports two ways to load models for inference.
95+
1. A HuggingFace Hub model id.
96+
2. Uncompressed model artifacts stored in a S3 bucket.
97+
98+
HuggingFace Hub model id
99+
^^^^^^^^^^^^^^^^^^^^^^^^
100+
101+
Using a HuggingFace Hub model id is the easiest way to get started with deploying Large Models via DJL Serving on SageMaker.
102+
DJL Serving will use this model id to download the model at runtime via the HuggingFace Transformers ``from_pretrained`` API.
103+
This method makes it easy to deploy models quickly, but for very large models the download time can become unreasonable.
104+
105+
For example, you can deploy the EleutherAI gpt-j-6B model like this:
106+
107+
.. code::
108+
109+
model = DJLModel(
110+
"EleutherAI/gpt-j-6B",
111+
"my_sagemaker_role",
112+
data_type="fp16",
113+
number_of_partitions=2
114+
)
115+
116+
predictor = model.deploy("ml.g5.12xlarge")
117+
118+
Uncompressed Model Artifacts stored in a S3 bucket
119+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
120+
121+
For models that are larger than 20GB (total checkpoint size), we recommend that you store the model in S3.
122+
Download times will be much faster compared to downloading from the HuggingFace Hub at runtime.
94123
DJL Serving Models expect a different model structure than most of the other frameworks in the SageMaker Python SDK.
95124
Specifically, DJLModels do not support loading models stored in tar.gz format.
96-
You must provide an Amazon S3 url pointing to uncompressed model artifacts (bucket and prefix).
97125
This is because DJL Serving is optimized for large models, and it implements a fast downloading mechanism for large models that require the artifacts be uncompressed.
98126

99127
For example, lets say you want to deploy the EleutherAI/gpt-j-6B model available on the HuggingFace Hub.
@@ -107,7 +135,18 @@ You can download the model and upload to S3 like this:
107135
# Upload to S3
108136
aws s3 sync gpt-j-6B s3://my_bucket/gpt-j-6B
109137
110-
You would then pass "s3://my_bucket/gpt-j-6B" as ``model_s3_uri`` to the ``DJLModel``.
138+
You would then pass "s3://my_bucket/gpt-j-6B" as ``model_id`` to the ``DJLModel`` like this:
139+
140+
.. code::
141+
142+
model = DJLModel(
143+
"s3://my_bucket/gpt-j-6B",
144+
"my_sagemaker_role",
145+
data_type="fp16",
146+
number_of_partitions=2
147+
)
148+
149+
predictor = model.deploy("ml.g5.12xlarge")
111150
112151
For language models we expect that the model weights, model config, and tokenizer config are provided in S3. The model
113152
should be loadable from the HuggingFace Transformers AutoModelFor<Task>.from_pretrained API, where task

doc/overview.rst

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1164,7 +1164,8 @@ More information about SageMaker Asynchronous Inference can be found in the `AWS
11641164

11651165
To deploy asynchronous inference endpoint, you will need to create a ``AsyncInferenceConfig`` object.
11661166
If you create ``AsyncInferenceConfig`` without specifying its arguments, the default ``S3OutputPath`` will
1167-
be ``s3://sagemaker-{REGION}-{ACCOUNTID}/async-endpoint-outputs/{UNIQUE-JOB-NAME}``. (example shown below):
1167+
be ``s3://sagemaker-{REGION}-{ACCOUNTID}/async-endpoint-outputs/{UNIQUE-JOB-NAME}``, ``S3FailurePath`` will
1168+
be ``s3://sagemaker-{REGION}-{ACCOUNTID}/async-endpoint-failures/{UNIQUE-JOB-NAME}`` (example shown below):
11681169

11691170
.. code:: python
11701171
@@ -1174,18 +1175,21 @@ be ``s3://sagemaker-{REGION}-{ACCOUNTID}/async-endpoint-outputs/{UNIQUE-JOB-NAME
11741175
async_config = AsyncInferenceConfig()
11751176
11761177
Or you can specify configurations in ``AsyncInferenceConfig`` as you like. All of those configuration parameters
1177-
are optional but if you don’t specify the ``output_path``, Amazon SageMaker will use the default ``S3OutputPath``
1178+
are optional but if you don’t specify the ``output_path`` or ``failure_path``, Amazon SageMaker will use the
1179+
default ``S3OutputPath`` or ``S3FailurePath``
11781180
mentioned above (example shown below):
11791181

11801182
.. code:: python
11811183
1182-
# Specify S3OutputPath, MaxConcurrentInvocationsPerInstance and NotificationConfig in the async config object
1184+
# Specify S3OutputPath, S3FailurePath, MaxConcurrentInvocationsPerInstance and NotificationConfig
1185+
# in the async config object
11831186
async_config = AsyncInferenceConfig(
11841187
output_path="s3://{s3_bucket}/{bucket_prefix}/output",
11851188
max_concurrent_invocations_per_instance=10,
11861189
notification_config = {
11871190
"SuccessTopic": "arn:aws:sns:aws-region:account-id:topic-name",
11881191
"ErrorTopic": "arn:aws:sns:aws-region:account-id:topic-name",
1192+
"IncludeInferenceResponseIn": ["SUCCESS_NOTIFICATION_TOPIC","ERROR_NOTIFICATION_TOPIC"],
11891193
}
11901194
)
11911195

src/sagemaker/async_inference/async_inference_config.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ def __init__(
3131
max_concurrent_invocations_per_instance=None,
3232
kms_key_id=None,
3333
notification_config=None,
34+
failure_path=None,
3435
):
3536
"""Initialize an AsyncInferenceConfig object for async inference configuration.
3637
@@ -45,6 +46,9 @@ def __init__(
4546
kms_key_id (str): Optional. The Amazon Web Services Key Management Service
4647
(Amazon Web Services KMS) key that Amazon SageMaker uses to encrypt the
4748
asynchronous inference output in Amazon S3. (Default: None)
49+
failure_path (str): Optional. The Amazon S3 location that endpoints upload model
50+
responses for failed requests. If no value is provided, Amazon SageMaker will
51+
use default Amazon S3 Async Inference failure path. (Default: None)
4852
notification_config (dict): Optional. Specifies the configuration for notifications
4953
of inference results for asynchronous inference. Only one notification is generated
5054
per invocation request (Default: None):
@@ -54,17 +58,24 @@ def __init__(
5458
* error_topic (str): Amazon SNS topic to post a notification to when inference
5559
fails. If no topic is provided, no notification is sent on failure.
5660
The key in notification_config is 'ErrorTopic'.
61+
* include_inference_response_in (list): Optional. When provided the inference
62+
response will be included in the notification topics. If not provided,
63+
a notification will still be generated on success/error, but will not
64+
contain the inference response.
65+
Valid options are SUCCESS_NOTIFICATION_TOPIC, ERROR_NOTIFICATION_TOPIC
5766
"""
5867
self.output_path = output_path
5968
self.max_concurrent_invocations_per_instance = max_concurrent_invocations_per_instance
6069
self.kms_key_id = kms_key_id
6170
self.notification_config = notification_config
71+
self.failure_path = failure_path
6272

6373
def _to_request_dict(self):
6474
"""Generates a request dictionary using the parameters provided to the class."""
6575
request_dict = {
6676
"OutputConfig": {
6777
"S3OutputPath": self.output_path,
78+
"S3FailurePath": self.failure_path,
6879
},
6980
}
7081

src/sagemaker/async_inference/async_inference_response.py

Lines changed: 30 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,11 @@
1717
from botocore.exceptions import ClientError
1818
from sagemaker.s3 import parse_s3_url
1919
from sagemaker.async_inference import WaiterConfig
20-
from sagemaker.exceptions import ObjectNotExistedError, UnexpectedClientError
20+
from sagemaker.exceptions import (
21+
ObjectNotExistedError,
22+
UnexpectedClientError,
23+
AsyncInferenceModelError,
24+
)
2125

2226

2327
class AsyncInferenceResponse(object):
@@ -32,6 +36,7 @@ def __init__(
3236
self,
3337
predictor_async,
3438
output_path,
39+
failure_path,
3540
):
3641
"""Initialize an AsyncInferenceResponse object.
3742
@@ -43,10 +48,13 @@ def __init__(
4348
that return this response.
4449
output_path (str): The Amazon S3 location that endpoints upload inference responses
4550
to.
51+
failure_path (str): The Amazon S3 location that endpoints upload model errors
52+
for failed requests.
4653
"""
4754
self.predictor_async = predictor_async
4855
self.output_path = output_path
4956
self._result = None
57+
self.failure_path = failure_path
5058

5159
def get_result(
5260
self,
@@ -71,28 +79,34 @@ def get_result(
7179

7280
if self._result is None:
7381
if waiter_config is None:
74-
self._result = self._get_result_from_s3(self.output_path)
82+
self._result = self._get_result_from_s3(self.output_path, self.failure_path)
7583
else:
7684
self._result = self.predictor_async._wait_for_output(
77-
self.output_path, waiter_config
85+
self.output_path, self.failure_path, waiter_config
7886
)
7987
return self._result
8088

81-
def _get_result_from_s3(
82-
self,
83-
output_path,
84-
):
89+
def _get_result_from_s3(self, output_path, failure_path):
8590
"""Get inference result from the output Amazon S3 path"""
8691
bucket, key = parse_s3_url(output_path)
8792
try:
8893
response = self.predictor_async.s3_client.get_object(Bucket=bucket, Key=key)
8994
return self.predictor_async.predictor._handle_response(response)
90-
except ClientError as ex:
91-
if ex.response["Error"]["Code"] == "NoSuchKey":
92-
raise ObjectNotExistedError(
93-
message="Inference could still be running",
94-
output_path=output_path,
95-
)
96-
raise UnexpectedClientError(
97-
message=ex.response["Error"]["Message"],
98-
)
95+
except ClientError as e:
96+
if e.response["Error"]["Code"] == "NoSuchKey":
97+
try:
98+
failure_bucket, failure_key = parse_s3_url(failure_path)
99+
failure_response = self.predictor_async.s3_client.get_object(
100+
Bucket=failure_bucket, Key=failure_key
101+
)
102+
failure_response = self.predictor_async.predictor._handle_response(
103+
failure_response
104+
)
105+
raise AsyncInferenceModelError(message=failure_response)
106+
except ClientError as ex:
107+
if ex.response["Error"]["Code"] == "NoSuchKey":
108+
raise ObjectNotExistedError(
109+
message="Inference could still be running", output_path=output_path
110+
)
111+
raise UnexpectedClientError(message=ex.response["Error"]["Message"])
112+
raise UnexpectedClientError(message=e.response["Error"]["Message"])

0 commit comments

Comments
 (0)