Dynamic LORA Adapter Sidecar for vLLM

This directory contains a configmap containing lora adapters configurations and script for a sidecar container to dynamically manage LORA adapters for a vLLM server running in the same Kubernetes pod by reconciling it with a configmap containing lora adapters.

Overview

The sidecar continuously monitors a ConfigMap mounted as a YAML configuration file. This file defines the desired state of LORA adapters, including:

Adapter ID: Unique identifier for the adapter.
Source: Path to the adapter's source files.
Base Model: The base model to which the adapter should be applied.
toRemove: (Optional) Indicates whether the adapter should be unloaded.

The sidecar uses the vLLM server's API to load or unload adapters based on the configuration. It also periodically reconciles the registered adapters on the vLLM server with the desired state defined in the ConfigMap, ensuring consistency.

Features

Dynamic Loading and Unloading: Load and unload LORA adapters without restarting the vLLM server.
Continuous Reconciliation: Ensures the vLLM server's state matches the desired configuration.
ConfigMap Integration: Leverages Kubernetes ConfigMaps for easy configuration management.
Easy Deployment: Provides a sample deployment YAML for quick setup.

Repository Contents

sidecar.py: Python script for the sidecar container.
Dockerfile: Dockerfile to build the sidecar image.
configmap.yaml: Example ConfigMap YAML file.
deployment.yaml: Example Kubernetes deployment YAML.

Usage

Build the Docker Image:
```
docker build -t <your-image-name> .
```

Create a configmap:

kubectl create configmap name-of-your-configmap --from-file=your-file.yaml

Mount the configmap and configure sidecar in your pod
```
volumeMounts: # DO NOT USE subPath
      - name: config-volume
        mountPath:  /config
```
Do not use subPath, since configmap updates are not reflected in the file

Screenshots & Testing

I tested the sidecar in my cluster with deployment and configmap specified in this repo. Here are the screen grabs of the logs from the sidecar and vllm server. I used the specified configmap, verified that the adapters were loaded by querying v1/models and looking at vllm logs. I changed the configmap and validated the same on vllm server. Note: There is slight lag between updates.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Dynamic LORA Adapter Sidecar for vLLM

Overview

Features

Repository Contents

Usage

Screenshots & Testing

Files

README.md

Latest commit

History

README.md

File metadata and controls

Dynamic LORA Adapter Sidecar for vLLM

Overview

Features

Repository Contents

Usage

Screenshots & Testing