Skip to content

Commit 2097027

Browse files
committed
Simplify POC installation
1 parent 0598654 commit 2097027

File tree

6 files changed

+101
-131
lines changed

6 files changed

+101
-131
lines changed

examples/poc/README.md

+31-44
Original file line numberDiff line numberDiff line change
@@ -1,68 +1,55 @@
11
# Envoy Ext Proc Gateway with LoRA Integration
22

3-
This project sets up an Envoy gateway to handle gRPC calls with integration of LoRA (Low-Rank Adaptation). The configuration aims to manage gRPC traffic through Envoy's external processing and custom routing based on headers and load balancing rules. The setup includes Kubernetes services and deployments for both the gRPC server and the vllm-lora application.
3+
This project sets up an Envoy gateway with a custom external processing which implements advanced routing logic tailored for LoRA (Low-Rank Adaptation) adapters. The routing algorithm is based on the model specified (using Open AI API format), and ensuring efficient load balancing based on model server metrics.
4+
5+
![alt text](./doc/envoy-gateway-bootstrap.png)
46

57
## Requirements
6-
- A vLLM based deployment (using the custom image provided below), with LoRA Adapters
78
- Kubernetes cluster
89
- Envoy Gateway v1.1 installed on your cluster: https://gateway.envoyproxy.io/v1.1/tasks/quickstart/
910
- `kubectl` command-line tool
1011
- Go (for local development)
11-
12-
## vLLM
13-
***This PoC uses a modified vLLM fork, the public image of the fork is here: `ghcr.io/tomatillo-and-multiverse/vllm:demo`***
14-
15-
The fork is here: https://github.com/kaushikmitr/vllm.
16-
17-
The summary of changes from standard vLLM are:
18-
- Active/Registered LoRA adapters are returned as a response header (used for lora-aware routing)
19-
- Queue size is returned as a response header
20-
- Active/Registered LoRA adapters are emitted as metrics (for out-of-band scraping during low traffic periods)
21-
22-
23-
## Overview
24-
25-
This project contains the necessary configurations and code to set up and deploy a service using Kubernetes, Envoy, and Go. The service involves routing based on the model specified (using Open AI API format), collecting metrics, and ensuring efficient load balancing.
26-
27-
![alt text](./envoy-gateway-bootstrap.png)
28-
12+
- A vLLM based deployment using a custom fork, with LoRA Adapters. ***This PoC uses a modified vLLM [fork](https://github.com/kaushikmitr/vllm), the public image of the fork is here: `ghcr.io/tomatillo-and-multiverse/vllm:demo`***. A sample deployement is provided under `./manifests/samples/vllm-lora-deployment.yaml`.
2913

3014
## Quickstart
3115

3216
### Steps
17+
1. **Deploy Sample vLLM Application**
18+
NOTE: Create a HuggingFace API token and store it in a secret named `hf-token` with key hf_api_token`. This is configured in the `HUGGING_FACE_HUB_TOKEN` and `HF_TOKEN` environment variables in `./manifests/samples/vllm-lora-deployment.yaml`.
3319

34-
1. **Apply Kubernetes Manifests**
3520
```bash
36-
cd manifests
37-
kubectl apply -f ext_proc.yaml
38-
kubectl apply -f vllm/vllm-lora-service.yaml
39-
kubectl apply -f vllm/vllm-lora-deployment.yaml
21+
kubectl apply -f ./manifests/samples/vllm-lora-deployment.yaml
22+
kubectl apply -f ./manifests/samples/vllm-lora-service.yaml
4023
```
24+
2. **Install GatewayClass with Ext Proc**
25+
A custom GatewayClass `llm-gateway` which is configured with the llm routing ext proc will be installed into the `llm-gateway` namespace. When you create Gateways, make sure the `llm-gateway` GatewayClass is used.
4126

42-
2. **Update `ext_proc.yaml`**
43-
- Ensure the `ext_proc.yaml` is updated with the pod names and internal IP addresses of the vLLM replicas. This step is crucial for the correct routing of requests based on headers.
27+
NOTE: Ensure the `llm-route-ext-proc` deployment is updated with the pod names and internal IP addresses of the vLLM replicas. This step is crucial for the correct routing of requests based on headers. This won't be needed once we make ext proc dynamically read the pods.
4428

45-
2. **Update and apply `gateway.yaml`**
46-
- Ensure the `gateway.yaml` is updated with the internal IP addresses of the ExtProc service. This step is also crucial for the correct routing of requests based on headers.
47-
```bash
48-
cd manifests
49-
kubectl apply -f gateway.yaml
29+
```bash
30+
kubectl apply -f ./manifests/installation.yaml
31+
```
32+
3. **Deploy Gateway**
33+
34+
```bash
35+
kubectl apply -f ./manifests/samples/gateway.yaml
5036
```
5137

52-
### Monitoring and Metrics
53-
54-
- The Go application collects metrics and saves the latest response headers in memory.
55-
- Ensure Envoy is configured to route based on the metrics collected from the `/metric` endpoint of different service pods.
56-
57-
## Contributing
38+
4. **Try it out**
39+
Wait until the gateway is ready.
40+
```bash
41+
IP=$(kubectl get gateway/llm-gateway -o jsonpath='{.status.addresses[0].value}')
42+
PORT=8081
43+
44+
curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
45+
"model": "tweet-summary",
46+
"prompt": "Write as if you were a critic: San Francisco",
47+
"max_tokens": 100,
48+
"temperature": 0
49+
}'
50+
```
5851

59-
1. Fork the repository.
60-
2. Create a new branch.
61-
3. Make your changes.
62-
4. Open a pull request.
6352

6453
## License
6554

6655
This project is licensed under the MIT License.
67-
68-
---

examples/poc/manifests/ext-proc.yaml

-68
This file was deleted.

examples/poc/manifests/gateway.yaml renamed to examples/poc/manifests/installation.yaml

+58-18
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,18 @@
1+
apiVersion: v1
2+
kind: Namespace
3+
metadata:
4+
name: llm-gateway
5+
16
---
27
apiVersion: gateway.envoyproxy.io/v1alpha1
38
kind: EnvoyProxy
49
metadata:
5-
name: custom-proxy-config
6-
namespace: envoy-gateway-system
10+
name: llm-route-envoy-config
11+
namespace: llm-gateway
712
spec:
813
provider:
914
type: Kubernetes
1015
kubernetes:
11-
envoyDeployment:
12-
container:
13-
image: envoyproxy/envoy:v1.31-latest
1416
envoyService:
1517
patch:
1618
type: StrategicMerge
@@ -78,7 +80,7 @@ spec:
7880
dns_lookup_family: V4_ONLY
7981
- name: ext_proc_cluster
8082
connect_timeout: 1000s
81-
type: STATIC
83+
type: LOGICAL_DNS
8284
http2_protocol_options: {}
8385
lb_policy: ROUND_ROBIN
8486
load_assignment:
@@ -88,28 +90,66 @@ spec:
8890
- endpoint:
8991
address:
9092
socket_address:
91-
address: 34.118.231.147
93+
address: llm-route-ext-proc.llm-gateway.svc.cluster.local
9294
port_value: 9002
9395
---
9496
apiVersion: gateway.networking.k8s.io/v1
9597
kind: GatewayClass
9698
metadata:
97-
name: inference-gateway
99+
name: llm-gateway
98100
spec:
99101
controllerName: gateway.envoyproxy.io/gatewayclass-controller
100102
parametersRef:
101103
group: gateway.envoyproxy.io
102104
kind: EnvoyProxy
103-
name: custom-proxy-config
104-
namespace: envoy-gateway-system
105+
name: llm-route-envoy-config
106+
namespace: llm-gateway
107+
105108
---
106-
apiVersion: gateway.networking.k8s.io/v1
107-
kind: Gateway
109+
apiVersion: apps/v1
110+
kind: Deployment
111+
metadata:
112+
name: llm-route-ext-proc
113+
namespace: llm-gateway
114+
labels:
115+
app: llm-route-ext-proc
116+
spec:
117+
replicas: 1
118+
selector:
119+
matchLabels:
120+
app: llm-route-ext-proc
121+
template:
122+
metadata:
123+
labels:
124+
app: llm-route-ext-proc
125+
spec:
126+
containers:
127+
- name: llm-route-ext-proc
128+
image: ghcr.io/tomatillo-and-multiverse/ext-proc:demo
129+
args:
130+
#TODO: specify label selector and dynamically update pods
131+
- -pods
132+
- "vllm-78665f78c4-h4kx4,vllm-78665f78c4-hnz84"
133+
- -podIPs
134+
- "10.24.11.6:8000,10.24.5.7:8000"
135+
- -enable-fairness
136+
- "false"
137+
ports:
138+
- containerPort: 9002
139+
- name: curl
140+
image: curlimages/curl
141+
command: ["sleep", "3600"]
142+
---
143+
apiVersion: v1
144+
kind: Service
108145
metadata:
109-
name: inference-gateway
146+
name: llm-route-ext-proc
147+
namespace: llm-gateway
110148
spec:
111-
gatewayClassName: inference-gateway
112-
listeners:
113-
- name: http
114-
protocol: HTTP
115-
port: 8080
149+
selector:
150+
app: llm-route-ext-proc
151+
ports:
152+
- protocol: TCP
153+
port: 9002
154+
targetPort: 9002
155+
type: ClusterIP
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
2+
---
3+
apiVersion: gateway.networking.k8s.io/v1
4+
kind: Gateway
5+
metadata:
6+
name: llm-gateway
7+
spec:
8+
gatewayClassName: llm-gateway
9+
listeners:
10+
- name: http
11+
protocol: HTTP
12+
port: 8080

examples/poc/manifests/vllm/vllm-lora-service.yaml renamed to examples/poc/manifests/samples/vllm-lora-service.yaml

-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@ metadata:
44
name: vllm-lora
55
namespace: default
66
spec:
7-
clusterIP: None
87
selector:
98
app: vllm
109
ports:

0 commit comments

Comments
 (0)