Skip to content

Commit 72b2af5

Browse files
committed
feature: Partition support for DJLModel using SM Training job
1 parent 1caab81 commit 72b2af5

File tree

3 files changed

+390
-23
lines changed

3 files changed

+390
-23
lines changed

doc/frameworks/djl/using_djl.rst

+25
Original file line numberDiff line numberDiff line change
@@ -221,6 +221,31 @@ see the `DJL Serving Documentation on Python Mode. <https://docs.djl.ai/docs/ser
221221

222222
For more information about DJL Serving, see the `DJL Serving documentation. <https://docs.djl.ai/docs/serving/index.html>`_
223223

224+
**************************
225+
Ahead of time partitioning
226+
**************************
227+
228+
To optimize the deployment of large models that do not fit in a single GPU, the model’s tensor weights are partitioned at
229+
runtime and each partition is loaded in individual GPU. But runtime partitioning takes significant amount of time and
230+
memory on model loading. So, DJLModel offers an ahead of time partitioning capability for DeepSpeed and FasterTransformer
231+
engines, which lets you partition your model weights and save them before deployment. HuggingFace does not support
232+
tensor parallelism, so ahead of time partitioning cannot be done for it. In our experiment with GPT-J model, loading
233+
this model with partitioned checkpoints increased the model loading time by 40%.
234+
235+
`partition` method invokes an Amazon SageMaker Training job to partition the model and upload those partitioned
236+
checkpoints to S3 bucket. You can either provide your desired S3 bucket to upload the partitioned checkpoints or it will be
237+
uploaded to the default SageMaker S3 bucket. Please note that this S3 bucket will be remembered for deployment. When you
238+
call `deploy` method after partition, DJLServing downloads the partitioned model checkpoints directly from the uploaded
239+
s3 url, if available.
240+
241+
.. code::
242+
243+
# partitions the model using Amazon Sagemaker Training Job.
244+
djl_model.partition("ml.g5.12xlarge")
245+
246+
predictor = deepspeed_model.deploy("ml.g5.12xlarge",
247+
initial_instance_count=1)
248+
224249
***********************
225250
SageMaker DJL Classes
226251
***********************

0 commit comments

Comments
 (0)