You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/amazon_sagemaker_debugger.rst
+6Lines changed: 6 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,12 @@ Amazon SageMaker Debugger
4
4
#########################
5
5
6
6
7
+
.. warning::
8
+
9
+
This page is no longer supported for maintenence. The live documentation is at `Debug and Profile Training Jobs Using Amazon SageMaker Debugger <https://docs.aws.amazon.com/sagemaker/latest/dg/train-debugger.html>`_
10
+
and `Debugger API <https://sagemaker.readthedocs.io/en/stable/api/training/debugger.html>`_.
11
+
12
+
7
13
Amazon SageMaker Debugger allows you to detect anomalies while training your machine learning model by emitting relevant data during training, storing the data and then analyzing it.
SageMaker Debugger deprecates the framework profiling feature starting from TensorFlow 2.11 and PyTorch 2.0. You can still use the feature in the previous versions of the frameworks and SDKs as follows.
73
+
74
+
* SageMaker Python SDK <= v2.130.0
75
+
* PyTorch >= v1.6.0, < v2.0
76
+
* TensorFlow >= v2.3.1, < v2.11
77
+
78
+
With the deprecation, SageMaker Debugger discontinues support for the APIs below this note.
79
+
80
+
See also `Amazon SageMaker Debugger Release Notes: March 16, 2023 <https://docs.aws.amazon.com/sagemaker/latest/dg/debugger-release-notes.html#debugger-release-notes-20230315>`_.
For models that are larger than 20GB (total checkpoint size), we recommend that you store the model in S3.
122
+
Download times will be much faster compared to downloading from the HuggingFace Hub at runtime.
94
123
DJL Serving Models expect a different model structure than most of the other frameworks in the SageMaker Python SDK.
95
124
Specifically, DJLModels do not support loading models stored in tar.gz format.
96
-
You must provide an Amazon S3 url pointing to uncompressed model artifacts (bucket and prefix).
97
125
This is because DJL Serving is optimized for large models, and it implements a fast downloading mechanism for large models that require the artifacts be uncompressed.
98
126
99
127
For example, lets say you want to deploy the EleutherAI/gpt-j-6B model available on the HuggingFace Hub.
@@ -107,7 +135,18 @@ You can download the model and upload to S3 like this:
107
135
# Upload to S3
108
136
aws s3 sync gpt-j-6B s3://my_bucket/gpt-j-6B
109
137
110
-
You would then pass "s3://my_bucket/gpt-j-6B" as ``model_s3_uri`` to the ``DJLModel``.
138
+
You would then pass "s3://my_bucket/gpt-j-6B" as ``model_id`` to the ``DJLModel`` like this:
139
+
140
+
.. code::
141
+
142
+
model = DJLModel(
143
+
"s3://my_bucket/gpt-j-6B",
144
+
"my_sagemaker_role",
145
+
data_type="fp16",
146
+
number_of_partitions=2
147
+
)
148
+
149
+
predictor = model.deploy("ml.g5.12xlarge")
111
150
112
151
For language models we expect that the model weights, model config, and tokenizer config are provided in S3. The model
113
152
should be loadable from the HuggingFace Transformers AutoModelFor<Task>.from_pretrained API, where task
0 commit comments