Skip to content

feature: Partition support for DJLModel using SM Training job #3820

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 16, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions doc/frameworks/djl/using_djl.rst
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,31 @@ see the `DJL Serving Documentation on Python Mode. <https://docs.djl.ai/docs/ser

For more information about DJL Serving, see the `DJL Serving documentation. <https://docs.djl.ai/docs/serving/index.html>`_

**************************
Ahead of time partitioning
**************************

To optimize the deployment of large models that do not fit in a single GPU, the model’s tensor weights are partitioned at
runtime and each partition is loaded in individual GPU. But runtime partitioning takes significant amount of time and
memory on model loading. So, DJLModel offers an ahead of time partitioning capability for DeepSpeed and FasterTransformer
engines, which lets you partition your model weights and save them before deployment. HuggingFace does not support
tensor parallelism, so ahead of time partitioning cannot be done for it. In our experiment with GPT-J model, loading
this model with partitioned checkpoints increased the model loading time by 40%.

`partition` method invokes an Amazon SageMaker Training job to partition the model and upload those partitioned
checkpoints to S3 bucket. You can either provide your desired S3 bucket to upload the partitioned checkpoints or it will be
uploaded to the default SageMaker S3 bucket. Please note that this S3 bucket will be remembered for deployment. When you
call `deploy` method after partition, DJLServing downloads the partitioned model checkpoints directly from the uploaded
s3 url, if available.

.. code::

# partitions the model using Amazon Sagemaker Training Job.
djl_model.partition("ml.g5.12xlarge")

predictor = deepspeed_model.deploy("ml.g5.12xlarge",
initial_instance_count=1)

***********************
SageMaker DJL Classes
***********************
Expand Down
Loading