Skip to content

Allow defining a default base model in the lora syncer configuration #609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 29, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 3 additions & 4 deletions config/manifests/vllm/gpu-deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -246,11 +246,10 @@ data:
vLLMLoRAConfig:
name: vllm-llama3.1-8b-instruct
port: 8000
defaultBaseModel: meta-llama/Llama-3.1-8B-Instruct
ensureExist:
models:
- base-model: meta-llama/Llama-3.1-8B-Instruct
id: food-review
- id: food-review
source: Kawon/llama3.1-food-finetune_v14_r8
- base-model: meta-llama/Llama-3.1-8B-Instruct
id: cad-fabricator
- id: cad-fabricator
source: redcathode/fabricator
14 changes: 6 additions & 8 deletions site-src/guides/adapter-rollout.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,12 @@ Change the ConfigMap to match the following (note the new entry under models):
vLLMLoRAConfig:
name: vllm-llama3-8b-instruct-adapters
port: 8000
defaultBaseModel: meta-llama/Llama-3.1-8B-Instruct
ensureExist:
models:
- base-model: meta-llama/Llama-3.1-8B-Instruct
id: food-review-1
- id: food-review-1
source: vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
- base-model: meta-llama/Llama-3.1-8B-Instruct
id: food-review-2
- id: food-review-2
source: mahimairaja/tweet-summarization-llama-2-finetuned
```

Expand Down Expand Up @@ -118,15 +117,14 @@ Unload the older versions from the servers by updating the LoRA syncer ConfigMap
vLLMLoRAConfig:
name: sql-loras-llama
port: 8000
defaultBaseModel: meta-llama/Llama-3.1-8B-Instruct
ensureExist:
models:
- base-model: meta-llama/Llama-3.1-8B-Instruct
id: food-review-2
- id: food-review-2
source: mahimairaja/tweet-summarization-llama-2-finetuned
ensureNotExist:
models:
- base-model: meta-llama/Llama-3.1-8B-Instruct
id: food-review-1
- id: food-review-1
source: vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
```

Expand Down
69 changes: 58 additions & 11 deletions tools/dynamic-lora-sidecar/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,20 +60,67 @@ The sidecar supports the following command-line arguments:

## Configuration Fields
- `vLLMLoRAConfig`[**required**] base key
- `host` [*optional*]Model server's host. defaults to localhost
- `host` [*optional*] Model server's host. defaults to localhost
- `port` [*optional*] Model server's port. defaults to 8000
- `name`[*optional*] Name of this config
- `ensureExist`[*optional*] List of models to ensure existence on specified model server.
- `models`[**required**] [list]
- `base-model`[*optional*] Base model for lora adapter
- `id`[**required**] unique id of lora adapter
- `source`[**required**] path (remote or local) to lora adapter
- `name` [*optional*] Name of this config
- `defaultBaseModel` [*optional*] Default base model to use for all adapters when not specified individually
- `ensureExist` [*optional*] List of models to ensure existence on specified model server.
- `models` [**required**] [list]
- `id` [**required**] unique id of lora adapter
- `source` [**required**] path (remote or local) to lora adapter
- `base-model` [*optional*] Base model for lora adapter (overrides defaultBaseModel)
- `ensureNotExist` [*optional*]
- `models`[**required**] [list]
- `id`[**required**] unique id of lora adapter
- `source`[**required**] path (remote or local) to lora adapter
- `base-model`[*optional*] Base model for lora adapter
- `models` [**required**] [list]
- `id` [**required**] unique id of lora adapter
- `source` [**required**] path (remote or local) to lora adapter
- `base-model` [*optional*] Base model for lora adapter (overrides defaultBaseModel)

## Example Configuration

Here's an example of using the `defaultBaseModel` field to avoid repetition in your configuration:

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: vllm-llama2-7b-adapters
data:
configmap.yaml: |
vLLMLoRAConfig:
name: vllm-llama2-7b
port: 8000
defaultBaseModel: meta-llama/Llama-2-7b-hf
ensureExist:
models:
- id: tweet-summary-1
source: vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
- id: tweet-summary-2
source: mahimairaja/tweet-summarization-llama-2-finetuned
```

In this example, both adapters will use `meta-llama/Llama-2-7b-hf` as their base model without needing to specify it for each adapter individually.

You can still override the default base model for specific adapters when needed:

```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: vllm-mixed-adapters
data:
configmap.yaml: |
vLLMLoRAConfig:
name: vllm-mixed
port: 8000
defaultBaseModel: meta-llama/Llama-2-7b-hf
ensureExist:
models:
- id: tweet-summary-1
source: vineetsharma/qlora-adapter-Llama-2-7b-hf-TweetSumm
- id: code-assistant
source: huggingface/code-assistant-lora
base-model: meta-llama/Llama-2-13b-hf # Override for this specific adapter
```
## Example Deployment

The [deployment.yaml](deployment.yaml) file shows an example of deploying the sidecar with custom parameters:
Expand Down
17 changes: 6 additions & 11 deletions tools/dynamic-lora-sidecar/deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ spec:
- name: lora-adapter-syncer
tty: true
stdin: true
image: <SIDECAR_IMAGE>
image: us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/lora-syncer:main
restartPolicy: Always
imagePullPolicy: Always
env:
Expand Down Expand Up @@ -106,22 +106,17 @@ metadata:
data:
configmap.yaml: |
vLLMLoRAConfig:
host: modelServerHost
name: sql-loras-llama
port: modelServerPort
defaultBaseModel: meta-llama/Llama-2-7b-hf
ensureExist:
models:
- base-model: meta-llama/Llama-3.1-8B-Instruct
id: sql-lora-v1
- id: sql-lora-v1
source: yard1/llama-2-7b-sql-lora-test
- base-model: meta-llama/Llama-3.1-8B-Instruct
id: sql-lora-v3
- id: sql-lora-v3
source: yard1/llama-2-7b-sql-lora-test
- base-model: meta-llama/Llama-3.1-8B-Instruct
id: sql-lora-v4
- id: sql-lora-v4
source: yard1/llama-2-7b-sql-lora-test
ensureNotExist:
models:
- base-model: meta-llama/Llama-3.1-8B-Instruct
id: sql-lora-v2
- id: sql-lora-v2
source: yard1/llama-2-7b-sql-lora-test
17 changes: 15 additions & 2 deletions tools/dynamic-lora-sidecar/sidecar/sidecar.py
Original file line number Diff line number Diff line change
Expand Up @@ -135,15 +135,24 @@ def port(self):
def model_server(self):
"""Model server {host}:{port}"""
return f"{self.host}:{self.port}"

@property
def default_base_model(self):
"""Default base model to use when not specified at adapter level"""
return self.config.get("defaultBaseModel", "")

@property
def ensure_exist_adapters(self):
"""Lora adapters in config under key `ensureExist` in set"""
adapters = self.config.get("ensureExist", {}).get("models", set())
default_model = self.default_base_model

return set(
[
LoraAdapter(
adapter["id"], adapter["source"], adapter.get("base-model", "")
adapter["id"],
adapter["source"],
adapter.get("base-model", default_model)
)
for adapter in adapters
]
Expand All @@ -153,10 +162,14 @@ def ensure_exist_adapters(self):
def ensure_not_exist_adapters(self):
"""Lora adapters in config under key `ensureNotExist` in set"""
adapters = self.config.get("ensureNotExist", {}).get("models", set())
default_model = self.default_base_model

return set(
[
LoraAdapter(
adapter["id"], adapter["source"], adapter.get("base-model", "")
adapter["id"],
adapter["source"],
adapter.get("base-model", default_model)
)
for adapter in adapters
]
Expand Down
11 changes: 7 additions & 4 deletions tools/dynamic-lora-sidecar/sidecar/validation.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@ properties:
name:
type: string
description: Name of this config
defaultBaseModel:
type: string
description: Default base model to use when not specified at adapter level
ensureExist:
type: object
description: List of models to ensure existence on specified model server
Expand All @@ -26,9 +29,9 @@ properties:
items:
type: object
properties:
base_model:
base-model:
type: string
description: Base model for LoRA adapter
description: Base model for LoRA adapter (overrides defaultBaseModel)
id:
type: string
description: Unique ID of LoRA adapter
Expand All @@ -50,9 +53,9 @@ properties:
items:
type: object
properties:
base_model:
base-model:
type: string
description: Base model for LoRA adapter
description: Base model for LoRA adapter (overrides defaultBaseModel)
id:
type: string
description: Unique ID of LoRA adapter
Expand Down