You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/frameworks/djl/using_djl.rst
+13-1
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ With the SageMaker Python SDK, you can use DJL Serving to host models that have
23
23
These can either be models you have trained/fine-tuned yourself, or models available publicly from the HuggingFace Hub.
24
24
DJL Serving in the SageMaker Python SDK supports hosting models for the popular HuggingFace NLP tasks, as well as Stable Diffusion.
25
25
26
-
You can either deploy your model using DeepSpeed or HuggingFace Accelerate, or let DJL Serving determine the best backend based on your model architecture and configuration.
26
+
You can either deploy your model using DeepSpeed, FasterTransformer, or HuggingFace Accelerate, or let DJL Serving determine the best backend based on your model architecture and configuration.
27
27
28
28
.. code:: python
29
29
@@ -63,11 +63,23 @@ If you want to use a specific backend, then you can create an instance of the co
63
63
number_of_partitions=2, # number of gpus to partition the model across
64
64
)
65
65
66
+
# Create a model using the FasterTransformer backend
67
+
68
+
fastertransformer_model = FasterTransformerModel(
69
+
"s3://my_bucket/my_saved_model_artifacts/", # This can also be a HuggingFace Hub model id
70
+
"my_sagemaker_role",
71
+
data_type="fp16",
72
+
task="text-generation",
73
+
tensor_parallel_degree=2, # number of gpus to partition the model across
74
+
)
75
+
66
76
# Deploy the model to an Amazon SageMaker Endpoint and get a Predictor
0 commit comments