Skip to content

Commit 54ee6d7

Browse files
coolkpahg-g
andauthored
Dynamic lora load/unload sidecar (#31)
* Dynamic lora load/unload sidecar * Formatting * Resolve README comments Signed-off-by: Kunjan Patel <[email protected]> * Address comments on sidecar, store updates in memory, rename base field Signed-off-by: Kunjan Patel <[email protected]> * Address comments in example deployment Signed-off-by: Kunjan Patel <[email protected]> * Address comments in example deployment Signed-off-by: Kunjan Patel <[email protected]> * base model is optional Signed-off-by: Kunjan Patel <[email protected]> * Check health of server before querying Signed-off-by: Kunjan Patel <[email protected]> * Check health of server before querying Signed-off-by: Kunjan Patel <[email protected]> * Docstrings Signed-off-by: Kunjan Patel <[email protected]> * Mock health check in tests Signed-off-by: Kunjan Patel <[email protected]> * Refactor configmap, switch to watchfiles to detect symbolic link target changes, pull dynamically from configmap Signed-off-by: Kunjan Patel <[email protected]> * Refactor configmap, switch to watchfiles to detect symbolic link target changes, pull dynamically from configmap Signed-off-by: Kunjan Patel <[email protected]> * Modify unittests Signed-off-by: Kunjan Patel <[email protected]> * Change example host and port to be explicit Signed-off-by: Kunjan Patel <[email protected]> * Change example sidecar name Signed-off-by: Kunjan Patel <[email protected]> * Add warning about using subPath Signed-off-by: Kunjan Patel <[email protected]> * Add screenshots Signed-off-by: Kunjan Patel <[email protected]> * Add screenshots Signed-off-by: Kunjan Patel <[email protected]> * Add testing results Signed-off-by: Kunjan Patel <[email protected]> * Add testing results Signed-off-by: Kunjan Patel <[email protected]> * Add config validation Signed-off-by: Kunjan Patel <[email protected]> * Add config documentation Signed-off-by: Kunjan Patel <[email protected]> * Add config documentation Signed-off-by: Kunjan Patel <[email protected]> * Add config validation Signed-off-by: Kunjan Patel <[email protected]> * Add config validation Signed-off-by: Kunjan Patel <[email protected]> * Make reconciling non blocking * Move under tools Signed-off-by: Kunjan <[email protected]> * Move under tools Signed-off-by: Kunjan <[email protected]> * Document usage of sidecar, available by default from 1.29 * Document usage of sidecar, available by default from 1.29 * Document usage of sidecar, available by default from 1.29 Signed-off-by: Kunjan <[email protected]> * Update tools/dynamic-lora-sidecar/README.md Co-authored-by: Abdullah Gharaibeh <[email protected]> * Update tools/dynamic-lora-sidecar/README.md Co-authored-by: Abdullah Gharaibeh <[email protected]> --------- Signed-off-by: Kunjan Patel <[email protected]> Signed-off-by: Kunjan <[email protected]> Co-authored-by: Abdullah Gharaibeh <[email protected]>
1 parent 078d684 commit 54ee6d7

11 files changed

+747
-0
lines changed

tools/dynamic-lora-sidecar/.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
sidecar/__pycache__/

tools/dynamic-lora-sidecar/Dockerfile

+23
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
FROM python:3.9-slim-buster AS test
2+
3+
WORKDIR /dynamic-lora-reconciler-test
4+
COPY requirements.txt .
5+
COPY sidecar/* .
6+
RUN pip install -r requirements.txt
7+
RUN python -m unittest discover || exit 1
8+
9+
FROM python:3.10-slim-buster
10+
11+
WORKDIR /dynamic-lora-reconciler
12+
13+
RUN python3 -m venv /opt/venv
14+
15+
ENV PATH="/opt/venv/bin:$PATH"
16+
17+
RUN pip install --upgrade pip
18+
COPY requirements.txt .
19+
RUN pip install --no-cache-dir -r requirements.txt
20+
21+
COPY sidecar/* .
22+
23+
CMD ["python", "sidecar.py"]

tools/dynamic-lora-sidecar/README.md

+71
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# Dynamic LORA Adapter Sidecar for vLLM
2+
3+
This is a sidecar-based tool to help rolling out new LoRA adapters to a set of running vLLM model servers. The user deploys the sidecar with a vLLM server, and using a ConfigMap, the user can express their intent as to which LoRA adapters they want to have the running vLLM servers to be configure with. The sidecar watches the ConfigMap and sends load/unload requests to the vLLM container to actuate on the user intent.
4+
5+
## Overview
6+
7+
The sidecar continuously monitors a ConfigMap mounted as a YAML configuration file. This file defines the desired state of LORA adapters, including:
8+
9+
- **Adapter ID:** Unique identifier for the adapter.
10+
- **Source:** Path to the adapter's source files.
11+
- **Base Model:** The base model to which the adapter should be applied.
12+
- **toRemove:** (Optional) Indicates whether the adapter should be unloaded.
13+
14+
The sidecar uses the vLLM server's API to load or unload adapters based on the configuration. It also periodically reconciles the registered adapters on the vLLM server with the desired state defined in the ConfigMap, ensuring consistency.
15+
16+
## Features
17+
18+
- **Dynamic Loading and Unloading:** Load and unload LORA adapters without restarting the vLLM server.
19+
- **Continuous Reconciliation:** Ensures the vLLM server's state matches the desired configuration.
20+
- **ConfigMap Integration:** Leverages Kubernetes ConfigMaps for easy configuration management.
21+
- **Easy Deployment:** Provides a sample deployment YAML for quick setup.
22+
23+
## Repository Contents
24+
25+
- **`sidecar.py`:** Python script for the sidecar container.
26+
- **`Dockerfile`:** Dockerfile to build the sidecar image.
27+
- **`configmap.yaml`:** Example ConfigMap YAML file.
28+
- **`deployment.yaml`:** Example Kubernetes deployment YAML.
29+
30+
## Usage
31+
32+
1. **Build the Docker Image:**
33+
```bash
34+
docker build -t <your-image-name> .
35+
2. **Create a configmap:**
36+
```bash
37+
kubectl create configmap name-of-your-configmap --from-file=your-file.yaml
38+
3. **Mount the configmap and configure sidecar in your pod**
39+
```yaml
40+
volumeMounts: # DO NOT USE subPath
41+
- name: config-volume
42+
mountPath: /config
43+
```
44+
Do not use subPath, since configmap updates are not reflected in the file
45+
46+
[deployment]: deployment.yaml it uses [sidecar](https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/)(`initContainer` with `restartPolicy` set to `always`) which is beta feature enabled by default since k8s version 1.29. They need to be enabled in 1.28 and prior to 1.28 sidecar are not officially supported.
47+
48+
## Configuration Fields
49+
- `vLLMLoRAConfig`[**required**] base key
50+
- `host` [*optional*]Model server's host. defaults to localhost
51+
- `port` [*optional*] Model server's port. defaults to 8000
52+
- `name`[*optional*] Name of this config
53+
- `ensureExist`[*optional*] List of models to ensure existence on specified model server.
54+
- `models`[**required**] [list]
55+
- `base-model`[*optional*] Base model for lora adapter
56+
- `id`[**required**] unique id of lora adapter
57+
- `source`[**required**] path (remote or local) to lora adapter
58+
- `ensureNotExist` [*optional*]
59+
- `models`[**required**] [list]
60+
- `id`[**required**] unique id of lora adapter
61+
- `source`[**required**] path (remote or local) to lora adapter
62+
- `base-model`[*optional*] Base model for lora adapter
63+
64+
65+
66+
67+
## Screenshots & Testing
68+
The sidecar was tested with the Deployment and ConfigMap specified in this repo. Here are screen grabs of the logs from the sidecar and vllm server. One can verify that the adapters were loaded by querying `v1/models` and looking at vllm logs.
69+
![lora-adapter-syncer](screenshots/lora-syncer-sidecar.png)
70+
![config map change](screenshots/configmap-change.png)
71+
![vllm-logs](screenshots/vllm-logs.png)
+127
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
apiVersion: apps/v1
2+
kind: Deployment
3+
metadata:
4+
name: llama-deployment
5+
spec:
6+
replicas: 1
7+
selector:
8+
matchLabels:
9+
app: llama-server
10+
template:
11+
metadata:
12+
labels:
13+
app: llama-server
14+
ai.gke.io/model: LLaMA2_7B
15+
ai.gke.io/inference-server: vllm
16+
examples.ai.gke.io/source: model-garden
17+
spec:
18+
shareProcessNamespace: true
19+
containers:
20+
- name: inference-server
21+
image: vllm/vllm-openai:v0.6.3.post1
22+
resources:
23+
requests:
24+
cpu: 5
25+
memory: 20Gi
26+
ephemeral-storage: 40Gi
27+
nvidia.com/gpu : 1
28+
limits:
29+
cpu: 5
30+
memory: 20Gi
31+
ephemeral-storage: 40Gi
32+
nvidia.com/gpu : 1
33+
command: ["/bin/sh", "-c"]
34+
args:
35+
- vllm serve meta-llama/Llama-2-7b-hf
36+
- --host=0.0.0.0
37+
- --port=8000
38+
- --tensor-parallel-size=1
39+
- --swap-space=16
40+
- --gpu-memory-utilization=0.95
41+
- --max-model-len=2048
42+
- --max-num-batched-tokens=4096
43+
- --disable-log-stats
44+
- --enable-loras
45+
- --max-loras=5
46+
env:
47+
- name: DEPLOY_SOURCE
48+
value: UI_NATIVE_MODEL
49+
- name: MODEL_ID
50+
value: "Llama2-7B"
51+
- name: AIP_STORAGE_URI
52+
value: "gs://vertex-model-garden-public-us/llama2/llama2-7b-hf"
53+
- name: VLLM_ALLOW_RUNTIME_LORA_UPDATING
54+
value: "true"
55+
- name: HF_TOKEN
56+
valueFrom:
57+
secretKeyRef:
58+
name: hf-token # The name of your Kubernetes Secret
59+
key: token # The specific key within the Secret
60+
- name: DYNAMIC_LORA_ROLLOUT_CONFIG
61+
value: "/config/configmap.yaml"
62+
volumeMounts:
63+
- mountPath: /dev/shm
64+
name: dshm
65+
initContainers:
66+
- name: lora-adapter-syncer
67+
tty: true
68+
stdin: true
69+
image: <SIDECAR_IMAGE>
70+
restartPolicy: Always
71+
imagePullPolicy: Always
72+
env:
73+
- name: DYNAMIC_LORA_ROLLOUT_CONFIG
74+
value: "/config/configmap.yaml"
75+
volumeMounts: # DO NOT USE subPath
76+
- name: config-volume
77+
mountPath: /config
78+
volumes:
79+
- name: dshm
80+
emptyDir:
81+
medium: Memory
82+
- name: config-volume
83+
configMap:
84+
name: dynamic-lora-config
85+
86+
---
87+
apiVersion: v1
88+
kind: Service
89+
metadata:
90+
name: llama-service
91+
spec:
92+
selector:
93+
app: llama-server
94+
type: ClusterIP
95+
ports:
96+
- protocol: TCP
97+
port: 8000
98+
targetPort: 8000
99+
100+
---
101+
102+
apiVersion: v1
103+
kind: ConfigMap
104+
metadata:
105+
name: dynamic-lora-config
106+
data:
107+
configmap.yaml: |
108+
vLLMLoRAConfig:
109+
host: modelServerHost
110+
name: sql-loras-llama
111+
port: modelServerPort
112+
ensureExist:
113+
models:
114+
- base-model: meta-llama/Llama-2-7b-hf
115+
id: sql-lora-v1
116+
source: yard1/llama-2-7b-sql-lora-test
117+
- base-model: meta-llama/Llama-2-7b-hf
118+
id: sql-lora-v3
119+
source: yard1/llama-2-7b-sql-lora-test
120+
- base-model: meta-llama/Llama-2-7b-hf
121+
id: sql-lora-v4
122+
source: yard1/llama-2-7b-sql-lora-test
123+
ensureNotExist:
124+
models:
125+
- base-model: meta-llama/Llama-2-7b-hf
126+
id: sql-lora-v2
127+
source: yard1/llama-2-7b-sql-lora-test
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
aiohttp
2+
jsonschema
3+
pyyaml
4+
requests
5+
watchfiles
6+
watchdog
Loading
Loading
Loading

0 commit comments

Comments
 (0)