Skip to content

Commit 8c98424

Browse files
siddvenkNamrata Madan
authored and
Namrata Madan
committed
feature: Djl Large Model Support (aws#3628)
1 parent cf0f569 commit 8c98424

File tree

8 files changed

+1602
-0
lines changed

8 files changed

+1602
-0
lines changed

doc/frameworks/djl/index.rst

+16
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
########################
2+
Deep Java Library (DJL)
3+
########################
4+
5+
A managed environment for inference using Deep Java Library (DJL) on Amazon SageMaker.
6+
For general information about using the SageMaker Python SDK, see :ref:`overview:Using the SageMaker Python SDK`.
7+
8+
.. toctree::
9+
:maxdepth: 1
10+
11+
using_djl
12+
13+
.. toctree::
14+
:maxdepth: 2
15+
16+
sagemaker.djl_inference
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
DJL Classes
2+
=================
3+
4+
5+
DJLModel
6+
---------------------------
7+
8+
.. autoclass:: sagemaker.djl_inference.model.DJLModel
9+
:members:
10+
:undoc-members:
11+
:show-inheritance:
12+
13+
DeepSpeedModel
14+
---------------------------
15+
16+
.. autoclass:: sagemaker.djl_inference.model.DeepSpeedModel
17+
:members:
18+
:undoc-members:
19+
:show-inheritance:
20+
21+
HuggingFaceAccelerateModel
22+
---------------------------
23+
24+
.. autoclass:: sagemaker.djl_inference.model.HuggingFaceAccelerateModel
25+
:members:
26+
:undoc-members:
27+
:show-inheritance:
28+
29+
DJLPredictor
30+
---------------------------
31+
32+
.. autoclass:: sagemaker.djl_inference.model.DJLPredictor
33+
:members:
34+
:undoc-members:
35+
:show-inheritance:

doc/frameworks/djl/using_djl.rst

+185
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,185 @@
1+
#######################################
2+
Use DJL with the SageMaker Python SDK
3+
#######################################
4+
5+
With the SageMaker Python SDK, you can use Deep Java Library to host models on Amazon SageMaker.
6+
7+
`Deep Java Library (DJL) Serving <https://docs.djl.ai/docs/serving/index.html>`_ is a high performance universal stand-alone model serving solution powered by `DJL <https://docs.djl.ai/index.html>`_.
8+
DJL Serving supports loading models trained with a variety of different frameworks. With the SageMaker Python SDK you can
9+
use DJL Serving to host large models using backends like DeepSpeed and HuggingFace Accelerate.
10+
11+
For information about supported versions of DJL Serving, see the `AWS documentation <https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html>`_.
12+
We recommend that you use the latest supported version because that's where we focus our development efforts.
13+
14+
For general information about using the SageMaker Python SDK, see :ref:`overview:Using the SageMaker Python SDK`.
15+
16+
.. contents::
17+
18+
*******************
19+
Deploy DJL models
20+
*******************
21+
22+
With the SageMaker Python SDK, you can use DJL Serving to host models that have been saved in the HuggingFace pretrained format.
23+
These can either be models you have trained/fine-tuned yourself, or models available publicly from the HuggingFace Hub.
24+
DJL Serving in the SageMaker Python SDK supports hosting models for the popular HuggingFace NLP tasks, as well as Stable Diffusion.
25+
26+
You can either deploy your model using DeepSpeed or HuggingFace Accelerate, or let DJL Serving determine the best backend based on your model architecture and configuration.
27+
28+
.. code:: python
29+
30+
# Create a DJL Model, backend is chosen automatically
31+
djl_model = DJLModel(
32+
"s3://my_bucket/my_saved_model_artifacts/",
33+
"my_sagemaker_role",
34+
data_type="fp16",
35+
task="text-generation",
36+
number_of_partitions=2 # number of gpus to partition the model across
37+
)
38+
39+
# Deploy the model to an Amazon SageMaker Endpoint and get a Predictor
40+
predictor = djl_model.deploy("ml.g5.12xlarge",
41+
initial_instance_count=1)
42+
43+
If you want to use a specific backend, then you can create an instance of the corresponding model directly.
44+
45+
.. code:: python
46+
47+
# Create a model using the DeepSpeed backend
48+
deepspeed_model = DeepSpeedModel(
49+
"s3://my_bucket/my_saved_model_artifacts/",
50+
"my_sagemaker_role",
51+
data_type="bf16",
52+
task="text-generation",
53+
tensor_parallel_degree=2, # number of gpus to partition the model across using tensor parallelism
54+
)
55+
56+
# Create a model using the HuggingFace Accelerate backend
57+
58+
hf_accelerate_model = HuggingFaceAccelerateModel(
59+
"s3://my_bucket/my_saved_model_artifacts/",
60+
"my_sagemaker_role",
61+
data_type="fp16",
62+
task="text-generation",
63+
number_of_partitions=2, # number of gpus to partition the model across
64+
)
65+
66+
# Deploy the model to an Amazon SageMaker Endpoint and get a Predictor
67+
deepspeed_predictor = deepspeed_model.deploy("ml.g5.12xlarge",
68+
initial_instance_count=1)
69+
hf_accelerate_predictor = hf_accelerate_model.deploy("ml.g5.12xlarge",
70+
initial_instance_count=1)
71+
72+
Regardless of which way you choose to create your model, a ``Predictor`` object is returned. You can use this ``Predictor``
73+
to do inference on the endpoint hosting your DJLModel.
74+
75+
Each ``Predictor`` provides a ``predict`` method, which can do inference with json data, numpy arrays, or Python lists.
76+
Inference data are serialized and sent to the DJL Serving model server by an ``InvokeEndpoint`` SageMaker operation. The
77+
``predict`` method returns the result of inference against your model.
78+
79+
By default, the inference data is serialized to a json string, and the inference result is a Python dictionary.
80+
81+
Model Directory Structure
82+
=========================
83+
84+
There are two components that are needed to deploy DJL Serving Models on Sagemaker.
85+
1. Model Artifacts (required)
86+
2. Inference code and Model Server Properties (optional)
87+
88+
These are stored and handled separately. Model artifacts should not be stored with the custom inference code and
89+
model server configuration.
90+
91+
Model Artifacts
92+
---------------
93+
94+
DJL Serving Models expect a different model structure than most of the other frameworks in the SageMaker Python SDK.
95+
Specifically, DJLModels do not support loading models stored in tar.gz format.
96+
You must provide an Amazon S3 url pointing to uncompressed model artifacts (bucket and prefix).
97+
This is because DJL Serving is optimized for large models, and it implements a fast downloading mechanism for large models that require the artifacts be uncompressed.
98+
99+
For example, lets say you want to deploy the EleutherAI/gpt-j-6B model available on the HuggingFace Hub.
100+
You can download the model and upload to S3 like this:
101+
102+
.. code::
103+
104+
# Requires Git LFS
105+
git clone https://huggingface.co/EleutherAI/gpt-j-6B
106+
107+
# Upload to S3
108+
aws s3 sync gpt-j-6B s3://my_bucket/gpt-j-6B
109+
110+
You would then pass "s3://my_bucket/gpt-j-6B" as ``model_s3_uri`` to the ``DJLModel``.
111+
112+
For language models we expect that the model weights, model config, and tokenizer config are provided in S3. The model
113+
should be loadable from the HuggingFace Transformers AutoModelFor<Task>.from_pretrained API, where task
114+
is the NLP task you want to host the model for. The weights must be stored as PyTorch compatible checkpoints.
115+
116+
Example:
117+
118+
.. code::
119+
120+
my_bucket/my_model/
121+
|- config.json
122+
|- added_tokens.json
123+
|- config.json
124+
|- pytorch_model-*-of-*.bin # model weights can be partitioned into multiple checkpoints
125+
|- tokenizer.json
126+
|- tokenizer_config.json
127+
|- vocab.json
128+
129+
For Stable Diffusion models, the model should be loadable from the HuggingFace Diffusers DiffusionPipeline.from_pretrained API.
130+
131+
Inference code and Model Server Properties
132+
------------------------------------------
133+
134+
You can provide custom inference code and model server configuration by specifying the ``source_dir`` and
135+
``entry_point`` arguments of the ``DJLModel``. These are not required. The model server configuration can be generated
136+
based on the arguments passed to the constructor, and we provide default inference handler code for DeepSpeed,
137+
HuggingFaceAccelerate, and Stable Diffusion. You can find these handler implementations in the `DJL Serving Github repository. <https://github.com/deepjavalibrary/djl-serving/tree/master/engines/python/setup/djl_python>`_
138+
139+
You can find documentation for the model server configurations on the `DJL Serving Docs website <https://docs.djl.ai/docs/serving/serving/docs/configurations.html>`_.
140+
141+
The code and configuration you want to deploy can either be stored locally or in S3. These files will be bundled into
142+
a tar.gz file that will be uploaded to SageMaker.
143+
144+
For example:
145+
146+
.. code::
147+
148+
sourcedir/
149+
|- script.py # Inference handler code
150+
|- serving.properties # Model Server configuration file
151+
|- requirements.txt # Additional Python requirements that will be installed at runtime via PyPi
152+
153+
In the above example, sourcedir will be bundled and compressed into a tar.gz file and uploaded as part of creating the Inference Endpoint.
154+
155+
The DJL Serving Model Server
156+
============================
157+
158+
The endpoint you create with ``deploy`` runs the DJL Serving model server.
159+
The model server loads the model from S3 and performs inference on the model in response to SageMaker ``InvokeEndpoint`` API calls.
160+
161+
DJL Serving is highly customizable. You can control aspects of both model loading and model serving. Most of the model server
162+
configuration are exposed through the ``DJLModel`` API. The SageMaker Python SDK will use the values it is passed to
163+
create the proper configuration file used when creating the inference endpoint. You can optionally provide your own
164+
``serving.properties`` file via the ``source_dir`` argument. You can find documentation about serving.properties in the
165+
`DJL Serving Documentation for model specific settings. <https://docs.djl.ai/docs/serving/serving/docs/configurations.html#model-specific-settings>`_
166+
167+
Within the SageMaker Python SDK, DJL Serving is used in Python mode. This allows users to provide their inference script,
168+
and data processing scripts in python. For details on how to write custom inference and data processing code, please
169+
see the `DJL Serving Documentation on Python Mode. <https://docs.djl.ai/docs/serving/serving/docs/modes.html#python-mode>`_
170+
171+
For more information about DJL Serving, see the `DJL Serving documentation. <https://docs.djl.ai/docs/serving/index.html>`_
172+
173+
***********************
174+
SageMaker DJL Classes
175+
***********************
176+
177+
For information about the different DJL Serving related classes in the SageMaker Python SDK, see https://sagemaker.readthedocs.io/en/stable/sagemaker.djl_inference.html.
178+
179+
********************************
180+
SageMaker DJL Serving Containers
181+
********************************
182+
183+
For information about the SageMaker DJL Serving containers, see:
184+
185+
- `Deep Learning Container (DLC) Images <https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/deep-learning-containers-images.html>`_ and `release notes <https://docs.aws.amazon.com/deep-learning-containers/latest/devguide/dlc-release-notes.html>`_

doc/frameworks/index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -16,3 +16,4 @@ The SageMaker Python SDK supports managed training and inference for a variety o
1616
sparkml/index
1717
tensorflow/index
1818
xgboost/index
19+
djl/index
+19
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License"). You
4+
# may not use this file except in compliance with the License. A copy of
5+
# the License is located at
6+
#
7+
# http://aws.amazon.com/apache2.0/
8+
#
9+
# or in the "license" file accompanying this file. This file is
10+
# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
11+
# ANY KIND, either express or implied. See the License for the specific
12+
# language governing permissions and limitations under the License.
13+
"""Placeholder docstring"""
14+
from __future__ import absolute_import
15+
16+
from sagemaker.djl_inference.model import DJLPredictor # noqa: F401
17+
from sagemaker.djl_inference.model import DJLModel # noqa: F401
18+
from sagemaker.djl_inference.model import DeepSpeedModel # noqa: F401
19+
from sagemaker.djl_inference.model import HuggingFaceAccelerateModel # noqa: F401
+50
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License"). You
4+
# may not use this file except in compliance with the License. A copy of
5+
# the License is located at
6+
#
7+
# http://aws.amazon.com/apache2.0/
8+
#
9+
# or in the "license" file accompanying this file. This file is
10+
# distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF
11+
# ANY KIND, either express or implied. See the License for the specific
12+
# language governing permissions and limitations under the License.
13+
"""Placeholder docstring"""
14+
from __future__ import absolute_import
15+
16+
STABLE_DIFFUSION_MODEL_TYPE = "stable-diffusion"
17+
18+
DEEPSPEED_RECOMMENDED_ARCHITECTURES = {
19+
"bloom",
20+
"opt",
21+
"gpt_neox",
22+
"gptj",
23+
"gpt_neo",
24+
"gpt2",
25+
"xlm-roberta",
26+
"roberta",
27+
"bert",
28+
STABLE_DIFFUSION_MODEL_TYPE,
29+
}
30+
31+
DEEPSPEED_SUPPORTED_ARCHITECTURES = {
32+
"bloom",
33+
"opt",
34+
"gpt_neox",
35+
"gptj",
36+
"gpt_neo",
37+
"gpt2",
38+
"xlm-roberta",
39+
"roberta",
40+
"bert",
41+
STABLE_DIFFUSION_MODEL_TYPE,
42+
}
43+
44+
ALLOWED_INSTANCE_FAMILIES = {
45+
"ml.g4dn",
46+
"ml.g5",
47+
"ml.p3",
48+
"ml.p4",
49+
"local_gpu",
50+
}

0 commit comments

Comments
 (0)