Skip to content

Commit 298b318

Browse files
kaushikmitrkfswain
authored andcommitted
make dynamic lora sidecar health check parameters configurable and force reconcile (kubernetes-sigs#605)
* update benchmarking guide with latest results with vllm v1 * update graph * make dynamic lora sidecar health check parameters configurable and forrce reconcile * update screenshots * make the health and refresh params in sidecar cmd line argument
1 parent 42a0ba0 commit 298b318

File tree

7 files changed

+174
-48
lines changed

7 files changed

+174
-48
lines changed

tools/dynamic-lora-sidecar/Dockerfile

+2-2
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ FROM python:3.9-slim-buster AS test
22

33
WORKDIR /dynamic-lora-reconciler-test
44
COPY requirements.txt .
5-
COPY sidecar/* .
5+
COPY sidecar/* ./
66
RUN pip install -r requirements.txt
77
RUN python -m unittest discover || exit 1
88

@@ -18,6 +18,6 @@ RUN pip install --upgrade pip
1818
COPY requirements.txt .
1919
RUN pip install --no-cache-dir -r requirements.txt
2020

21-
COPY sidecar/* .
21+
COPY sidecar/* ./
2222

2323
CMD ["python", "sidecar.py"]

tools/dynamic-lora-sidecar/README.md

+56-13
Original file line numberDiff line numberDiff line change
@@ -29,21 +29,34 @@ The sidecar uses the vLLM server's API to load or unload adapters based on the c
2929

3030
## Usage
3131

32+
3233
1. **Build the Docker Image:**
3334
```bash
3435
docker build -t <your-image-name> .
36+
```
37+
3538
2. **Create a configmap:**
36-
```bash
37-
kubectl create configmap name-of-your-configmap --from-file=your-file.yaml
39+
```bash
40+
kubectl create configmap name-of-your-configmap --from-file=your-file.yaml
41+
```
42+
3843
3. **Mount the configmap and configure sidecar in your pod**
39-
```yaml
40-
volumeMounts: # DO NOT USE subPath
41-
- name: config-volume
42-
mountPath: /config
43-
```
44-
Do not use subPath, since configmap updates are not reflected in the file
44+
```yaml
45+
volumeMounts: # DO NOT USE subPath
46+
- name: config-volume
47+
mountPath: /config
48+
```
49+
Do not use subPath, since configmap updates are not reflected in the file
4550
46-
[deployment]: deployment.yaml it uses [sidecar](https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/)(`initContainer` with `restartPolicy` set to `always`) which is beta feature enabled by default since k8s version 1.29. They need to be enabled in 1.28 and prior to 1.28 sidecar are not officially supported.
51+
## Command Line Arguments
52+
53+
The sidecar supports the following command-line arguments:
54+
55+
- `--health-check-timeout`: Maximum time in seconds to wait for the vLLM server health check (default: 300)
56+
- `--health-check-interval`: Interval in seconds between health check attempts (default: 2)
57+
- `--reconcile-trigger`: Time in seconds between forced reconciliation runs (default: 5)
58+
- `--config`: Path to the config map file (default: value from DYNAMIC_LORA_ROLLOUT_CONFIG env var or "/config/configmap.yaml")
59+
- `--config-validation`: Enable config validation (default: True)
4760

4861
## Configuration Fields
4962
- `vLLMLoRAConfig`[**required**] base key
@@ -61,11 +74,41 @@ The sidecar uses the vLLM server's API to load or unload adapters based on the c
6174
- `source`[**required**] path (remote or local) to lora adapter
6275
- `base-model`[*optional*] Base model for lora adapter
6376

64-
65-
77+
## Example Deployment
78+
79+
The [deployment.yaml](deployment.yaml) file shows an example of deploying the sidecar with custom parameters:
80+
81+
```yaml
82+
apiVersion: apps/v1
83+
kind: Deployment
84+
metadata:
85+
name: dynamic-lora-reconciler
86+
spec:
87+
replicas: 1
88+
selector:
89+
matchLabels:
90+
app: dynamic-lora-reconciler
91+
template:
92+
metadata:
93+
labels:
94+
app: dynamic-lora-reconciler
95+
spec:
96+
containers:
97+
- name: reconciler
98+
image: your-image:tag
99+
command: ["python", "sidecar.py", "--health-check-timeout", "600", "--health-check-interval", "5", "--reconcile-trigger", "10"] #optional if overriding default values
100+
volumeMounts:
101+
- name: config-volume
102+
mountPath: /config
103+
volumes:
104+
- name: config-volume
105+
configMap:
106+
name: name-of-your-configmap
107+
```
108+
109+
Note: This uses [sidecar](https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/)(`initContainer` with `restartPolicy` set to `always`) which is beta feature enabled by default since k8s version 1.29. They need to be enabled in 1.28 and prior to 1.28 sidecar are not officially supported.
66110

67111
## Screenshots & Testing
68112
The sidecar was tested with the Deployment and ConfigMap specified in this repo. Here are screen grabs of the logs from the sidecar and vllm server. One can verify that the adapters were loaded by querying `v1/models` and looking at vllm logs.
69-
![lora-adapter-syncer](screenshots/lora-syncer-sidecar.png)
70-
![config map change](screenshots/configmap-change.png)
113+
![lora-adapter-syncer](screenshots/lora-syncer-logs.png)
71114
![vllm-logs](screenshots/vllm-logs.png)
Binary file not shown.
Loading
Binary file not shown.

tools/dynamic-lora-sidecar/sidecar/sidecar.py

+62-17
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
import requests
22
import yaml
33
import time
4+
import argparse
45
from jsonschema import validate
56
from watchfiles import awatch
67
from dataclasses import dataclass
@@ -30,18 +31,35 @@ def current_time_human() -> str:
3031
return now.strftime("%Y-%m-%d %H:%M:%S %Z%z")
3132

3233

34+
def parse_arguments():
35+
"""Parse command line arguments."""
36+
parser = argparse.ArgumentParser(description='vLLM LoRA Adapter Reconciler')
37+
parser.add_argument('--health-check-timeout', type=int, default=300,
38+
help='Health check timeout in seconds (default: 300)')
39+
parser.add_argument('--health-check-interval', type=int, default=2,
40+
help='Health check interval in seconds (default: 2)')
41+
parser.add_argument('--reconcile-trigger', type=int, default=5,
42+
help='Reconciliation trigger interval in seconds (default: 5)')
43+
parser.add_argument('--config', type=str, default=CONFIG_MAP_FILE,
44+
help=f'Path to config map file (default: {CONFIG_MAP_FILE})')
45+
parser.add_argument('--config-validation', action='store_true', default=True,
46+
help='Enable config validation (default: True)')
47+
return parser.parse_args()
48+
49+
3350
class FileChangeHandler(FileSystemEventHandler):
3451
"""Custom event handler that handles file modifications."""
3552

36-
def __init__(self, reconciler):
53+
def __init__(self, reconciler, config_file):
3754
super().__init__()
3855
self.reconciler = reconciler
56+
self.config_file = config_file
3957

4058
def on_modified(self, event):
4159
logging.info("modified!")
42-
logging.info(f"Config '{CONFIG_MAP_FILE}' modified!")
60+
logging.info(f"Config '{self.config_file}' modified!")
4361
self.reconciler.reconcile()
44-
logging.info(f"model server reconcile to Config '{CONFIG_MAP_FILE}' !")
62+
logging.info(f"model server reconcile to Config '{self.config_file}' !")
4563

4664

4765
@dataclass
@@ -65,10 +83,17 @@ class LoraReconciler:
6583
Reconciles adapters registered on vllm server with adapters listed in configmap in current state
6684
"""
6785

68-
def __init__(self, config_validation=True):
69-
self.health_check_timeout = datetime.timedelta(seconds=300)
70-
self.health_check_interval = datetime.timedelta(seconds=15)
86+
def __init__(self, config_file, health_check_timeout, health_check_interval,
87+
reconcile_trigger_seconds, config_validation=True):
88+
self.config_file = config_file
7189
self.config_validation = config_validation
90+
self.health_check_timeout = datetime.timedelta(seconds=health_check_timeout)
91+
self.health_check_interval = datetime.timedelta(seconds=health_check_interval)
92+
self.reconcile_trigger_seconds = reconcile_trigger_seconds
93+
94+
logging.info(f"Settings initialized: health check timeout={health_check_timeout}s, "
95+
f"interval={health_check_interval}s, "
96+
f"reconcile trigger={self.reconcile_trigger_seconds}s")
7297

7398
def validate_config(self, c) -> bool:
7499
try:
@@ -77,14 +102,14 @@ def validate_config(self, c) -> bool:
77102
validate(instance=c, schema=schema)
78103
return True
79104
except Exception as e:
80-
logging.error(f"Cannot load config {CONFIG_MAP_FILE} validation error: {e}")
105+
logging.error(f"Cannot load config {self.config_file} validation error: {e}")
81106
return False
82107

83108
@property
84109
def config(self):
85110
"""Load configmap into memory"""
86111
try:
87-
with open(CONFIG_MAP_FILE, "r") as f:
112+
with open(self.config_file, "r") as f:
88113
c = yaml.safe_load(f)
89114
if self.config_validation and not self.validate_config(c):
90115
return {}
@@ -93,7 +118,7 @@ def config(self):
93118
c = c.get("vLLMLoRAConfig", {})
94119
return c
95120
except Exception as e:
96-
logging.error(f"cannot load config {CONFIG_MAP_FILE} {e}")
121+
logging.error(f"cannot load config {self.config_file} {e}")
97122
return {}
98123

99124
@property
@@ -215,8 +240,9 @@ def unload_adapter(self, adapter: LoraAdapter):
215240
def reconcile(self):
216241
"""Reconciles model server with current version of configmap"""
217242
logging.info(
218-
f"reconciling model server {self.model_server} with config stored at {CONFIG_MAP_FILE}"
243+
f"reconciling model server {self.model_server} with config stored at {self.config_file}"
219244
)
245+
220246
if not self.is_server_healthy:
221247
logging.error(f"vllm server at {self.model_server} not healthy")
222248
return
@@ -240,26 +266,45 @@ def reconcile(self):
240266

241267

242268
async def main():
243-
reconciler_instance = LoraReconciler()
244-
logging.info(f"Running initial reconcile for config map {CONFIG_MAP_FILE}")
269+
args = parse_arguments()
270+
271+
# Update CONFIG_MAP_FILE with argument value
272+
config_file = args.config
273+
274+
reconciler_instance = LoraReconciler(
275+
config_file=config_file,
276+
health_check_timeout=args.health_check_timeout,
277+
health_check_interval=args.health_check_interval,
278+
reconcile_trigger_seconds=args.reconcile_trigger,
279+
config_validation=args.config_validation
280+
)
281+
282+
logging.info(f"Running initial reconcile for config map {config_file}")
245283
reconciler_instance.reconcile()
246284

247-
event_handler = FileChangeHandler(reconciler_instance)
285+
event_handler = FileChangeHandler(reconciler_instance, config_file)
248286
observer = Observer()
249287
observer.schedule(
250-
event_handler, path=os.path.dirname(CONFIG_MAP_FILE), recursive=False
288+
event_handler, path=os.path.dirname(config_file), recursive=False
251289
)
252290
observer.start()
253291

254292
try:
255-
logging.info(f"Starting to watch {CONFIG_MAP_FILE} for changes...")
293+
logging.info(f"Starting to watch {config_file} for changes and performing periodic reconciliation...")
256294
while True:
257-
await asyncio.sleep(1)
295+
# Get current trigger interval from reconciler
296+
trigger_seconds = reconciler_instance.reconcile_trigger_seconds
297+
logging.info(f"Waiting {trigger_seconds}s before next reconciliation...")
298+
# Wait for configured trigger interval
299+
await asyncio.sleep(trigger_seconds)
300+
# Force trigger reconciliation
301+
logging.info("Periodic reconciliation triggered")
302+
reconciler_instance.reconcile()
258303
except KeyboardInterrupt:
259304
logging.info("Stopped by user.")
260305
observer.stop()
261306
observer.join()
262307

263308

264309
if __name__ == "__main__":
265-
asyncio.run(main())
310+
asyncio.run(main())

tools/dynamic-lora-sidecar/sidecar/test_sidecar.py

+54-16
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,10 @@
22
from unittest.mock import patch, Mock, mock_open, call
33
import yaml
44
import os
5-
from sidecar import LoraReconciler, CONFIG_MAP_FILE, BASE_FIELD, LoraAdapter
5+
import datetime
6+
from sidecar import LoraReconciler, LoraAdapter, CONFIG_MAP_FILE, BASE_FIELD
67

8+
# Update TEST_CONFIG_DATA to include the new configuration parameters
79
TEST_CONFIG_DATA = {
810
BASE_FIELD: {
911
"host": "localhost",
@@ -49,13 +51,14 @@
4951
},
5052
}
5153
}
54+
5255
EXIST_ADAPTERS = [
53-
LoraAdapter(a["id"], a["base-model"], a["source"])
56+
LoraAdapter(a["id"], a["source"], a["base-model"])
5457
for a in TEST_CONFIG_DATA[BASE_FIELD]["ensureExist"]["models"]
5558
]
5659

5760
NOT_EXIST_ADAPTERS = [
58-
LoraAdapter(a["id"], a["base-model"], a["source"])
61+
LoraAdapter(a["id"], a["source"], a["base-model"])
5962
for a in TEST_CONFIG_DATA[BASE_FIELD]["ensureNotExist"]["models"]
6063
]
6164
RESPONSES = {
@@ -101,7 +104,15 @@ def setUp(self, mock_get, mock_file):
101104
mock_response = getMockResponse()
102105
mock_response.json.return_value = RESPONSES["v1/models"]
103106
mock_get.return_value = mock_response
104-
self.reconciler = LoraReconciler(False)
107+
108+
# Create reconciler with command line argument values instead of config file values
109+
self.reconciler = LoraReconciler(
110+
config_file=CONFIG_MAP_FILE,
111+
health_check_timeout=180,
112+
health_check_interval=10,
113+
reconcile_trigger_seconds=30,
114+
config_validation=False
115+
)
105116
self.maxDiff = None
106117

107118
@patch("sidecar.requests.get")
@@ -167,20 +178,47 @@ def test_reconcile(self, mock_post, mock_get, mock_file):
167178
mock_get_response.json.return_value = RESPONSES["v1/models"]
168179
mock_get.return_value = mock_get_response
169180
mock_post.return_value = getMockResponse()
170-
self.reconciler = LoraReconciler()
171-
self.reconciler.reconcile()
172181

173-
# 1 adapter is in both exist and not exist list, only 2 are expected to be loaded
174-
mock_load.assert_has_calls(
175-
calls=[call(EXIST_ADAPTERS[0]), call(EXIST_ADAPTERS[2])]
182+
# Create reconciler with command line argument values
183+
self.reconciler = LoraReconciler(
184+
config_file=CONFIG_MAP_FILE,
185+
health_check_timeout=180,
186+
health_check_interval=10,
187+
reconcile_trigger_seconds=30,
188+
config_validation=False
176189
)
177-
assert mock_load.call_count == 2
190+
self.reconciler.reconcile()
178191

179-
# 1 adapter is in both exist and not exist list, only 2 are expected to be unloaded
180-
mock_unload.assert_has_calls(
181-
calls=[call(NOT_EXIST_ADAPTERS[0]), call(NOT_EXIST_ADAPTERS[2])]
182-
)
183-
assert mock_unload.call_count == 2
192+
# First check the call count
193+
self.assertEqual(mock_load.call_count, 2, "Expected 2 load adapter calls")
194+
self.assertEqual(mock_unload.call_count, 2, "Expected 2 unload adapter calls")
195+
196+
# Check that the adapters with the correct IDs were loaded
197+
loaded_ids = [call.args[0].id for call in mock_load.call_args_list]
198+
self.assertIn("sql-lora-v1", loaded_ids, "sql-lora-v1 should have been loaded")
199+
self.assertIn("already_exists", loaded_ids, "already_exists should have been loaded")
200+
201+
# Check that the adapters with the correct IDs were unloaded
202+
unloaded_ids = [call.args[0].id for call in mock_unload.call_args_list]
203+
self.assertIn("sql-lora-v2", unloaded_ids, "sql-lora-v2 should have been unloaded")
204+
self.assertIn("to_remove", unloaded_ids, "to_remove should have been unloaded")
205+
206+
def test_health_check_settings(self):
207+
"""Test that health check settings are properly initialized from command line args"""
208+
# Create reconciler with specific values
209+
reconciler = LoraReconciler(
210+
config_file=CONFIG_MAP_FILE,
211+
health_check_timeout=240,
212+
health_check_interval=15,
213+
reconcile_trigger_seconds=45,
214+
config_validation=False
215+
)
216+
217+
# Check that values are properly set
218+
self.assertEqual(reconciler.health_check_timeout, datetime.timedelta(seconds=240))
219+
self.assertEqual(reconciler.health_check_interval, datetime.timedelta(seconds=15))
220+
self.assertEqual(reconciler.reconcile_trigger_seconds, 45)
221+
184222

185223
if __name__ == "__main__":
186-
unittest.main()
224+
unittest.main()

0 commit comments

Comments
 (0)