You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This project sets up an Envoy gateway with a custom external processing which implements advanced routing logic tailored for LoRA (Low-Rank Adaptation) adapters. The routing algorithm is based on the model specified (using Open AI API format), and ensuring efficient load balancing based on model server metrics.
3
+
This project sets up an Envoy gateway with a custom external processing which implements advanced routing logic tailored for LoRA (Low-Rank Adaptation) adapters. The routing algorithm is based on the model specified (using Open AI API format), and ensuring efficient load balancing based on model server metrics.
4
4
5
-

5
+

6
6
7
7
## Requirements
8
+
8
9
- Kubernetes cluster
9
10
- Envoy Gateway v1.1 installed on your cluster: https://gateway.envoyproxy.io/v1.1/tasks/quickstart/
10
11
-`kubectl` command-line tool
11
12
- Go (for local development)
12
-
- A vLLM based deployment using a custom fork, with LoRA Adapters. ***This PoC uses a modified vLLM [fork](https://github.com/kaushikmitr/vllm), the public image of the fork is here: `ghcr.io/tomatillo-and-multiverse/vllm:demo`***. A sample deployement is provided under `./manifests/samples/vllm-lora-deployment.yaml`.
13
+
- A vLLM based deployment using a custom fork, with LoRA Adapters. ***This PoC uses a modified vLLM [fork](https://github.com/kaushikmitr/vllm), the public image of the fork is here: `ghcr.io/tomatillo-and-multiverse/vllm:demo`***. A sample deployement is provided under `./manifests/samples/vllm-lora-deployment.yaml`.
13
14
14
15
## Quickstart
15
16
16
17
### Steps
18
+
17
19
1.**Deploy Sample vLLM Application**
18
20
NOTE: Create a HuggingFace API token and store it in a secret named `hf-token` with key hf_api_token`. This is configured in the `HUGGING_FACE_HUB_TOKEN` and `HF_TOKEN` environment variables in `./manifests/samples/vllm-lora-deployment.yaml`.
A custom GatewayClass `llm-gateway` which is configured with the llm routing ext proc will be installed into the `llm-gateway` namespace. It's configured to listen on port 8081 for traffic through ext-proc (in addition to the default 8080), see the `EnvoyProxy` configuration in `installation.yaml`. When you create Gateways, make sure the `llm-gateway` GatewayClass is used.
26
29
@@ -29,14 +32,16 @@ This project sets up an Envoy gateway with a custom external processing which i
29
32
```bash
30
33
kubectl apply -f ./manifests/installation.yaml
31
34
```
35
+
32
36
3.**Deploy Gateway**
33
-
37
+
34
38
```bash
35
39
kubectl apply -f ./manifests/samples/gateway.yaml
36
40
```
37
41
38
-
4.**Try it out**
42
+
4.**Try it out**
39
43
Wait until the gateway is ready.
44
+
40
45
```bash
41
46
IP=$(kubectl get gateway/llm-gateway -o jsonpath='{.status.addresses[0].value}')
42
47
PORT=8081
@@ -49,7 +54,6 @@ This project sets up an Envoy gateway with a custom external processing which i
0 commit comments