Skip to content

add regression testing docs #755

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 15, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
237 changes: 237 additions & 0 deletions config/manifests/regression-testing/inferencemodel.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,237 @@
apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceModel
metadata:
name: adapter-0
spec:
modelName: adapter-0
criticality: Critical
poolRef:
name: vllm-llama3-8b-instruct
targetModels:
- name: adapter-0
weight: 100

---

apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceModel
metadata:
name: adapter-1
spec:
modelName: adapter-1
criticality: Critical
poolRef:
name: vllm-llama3-8b-instruct
targetModels:
- name: adapter-1
weight: 100

---

apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceModel
metadata:
name: adapter-2
spec:
modelName: adapter-2
criticality: Critical
poolRef:
name: vllm-llama3-8b-instruct
targetModels:
- name: adapter-2
weight: 100

---

apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceModel
metadata:
name: adapter-3
spec:
modelName: adapter-3
criticality: Critical
poolRef:
name: vllm-llama3-8b-instruct
targetModels:
- name: adapter-3
weight: 100

---

apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceModel
metadata:
name: adapter-4
spec:
modelName: adapter-4
criticality: Critical
poolRef:
name: vllm-llama3-8b-instruct
targetModels:
- name: adapter-4
weight: 100

---

apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceModel
metadata:
name: adapter-5
spec:
modelName: adapter-5
criticality: Critical
poolRef:
name: vllm-llama3-8b-instruct
targetModels:
- name: adapter-5
weight: 100

---

apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceModel
metadata:
name: adapter-6
spec:
modelName: adapter-6
criticality: Critical
poolRef:
name: vllm-llama3-8b-instruct
targetModels:
- name: adapter-6
weight: 100

---

apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceModel
metadata:
name: adapter-7
spec:
modelName: adapter-7
criticality: Critical
poolRef:
name: vllm-llama3-8b-instruct
targetModels:
- name: adapter-7
weight: 100

---

apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceModel
metadata:
name: adapter-8
spec:
modelName: adapter-8
criticality: Critical
poolRef:
name: vllm-llama3-8b-instruct
targetModels:
- name: adapter-8
weight: 100

---

apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceModel
metadata:
name: adapter-9
spec:
modelName: adapter-9
criticality: Critical
poolRef:
name: vllm-llama3-8b-instruct
targetModels:
- name: adapter-9
weight: 100

---

apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceModel
metadata:
name: adapter-10
spec:
modelName: adapter-10
criticality: Critical
poolRef:
name: vllm-llama3-8b-instruct
targetModels:
- name: adapter-10
weight: 100

---

apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceModel
metadata:
name: adapter-11
spec:
modelName: adapter-11
criticality: Critical
poolRef:
name: vllm-llama3-8b-instruct
targetModels:
- name: adapter-11
weight: 100

---

apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceModel
metadata:
name: adapter-12
spec:
modelName: adapter-12
criticality: Critical
poolRef:
name: vllm-llama3-8b-instruct
targetModels:
- name: adapter-12
weight: 100


---

apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceModel
metadata:
name: adapter-13
spec:
modelName: adapter-13
criticality: Critical
poolRef:
name: vllm-llama3-8b-instruct
targetModels:
- name: adapter-13
weight: 100


---

apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceModel
metadata:
name: adapter-14
spec:
modelName: adapter-14
criticality: Critical
poolRef:
name: vllm-llama3-8b-instruct
targetModels:
- name: adapter-14
weight: 100

---


apiVersion: inference.networking.x-k8s.io/v1alpha2
kind: InferenceModel
metadata:
name: base-model
spec:
modelName: meta-llama/Llama-3.1-8B-Instruct
criticality: Critical
poolRef:
name: vllm-llama3-8b-instruct
62 changes: 62 additions & 0 deletions config/manifests/regression-testing/multi-lora-regression.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: benchmark-tool
name: benchmark-tool
spec:
replicas: 1
selector:
matchLabels:
app: benchmark-tool
template:
metadata:
labels:
app: benchmark-tool
spec:
containers:
# Build image from this source https://github.com/AI-Hypercomputer/inference-benchmark/tree/46d638262650a1928e47699d78ab2da062d4422d
- image: '<DOCKER_IMAGE>'
imagePullPolicy: Always
name: benchmark-tool
command:
- bash
- -c
- ./latency_throughput_curve.sh
env:
- name: IP
value: '<target-ip>'
- name: REQUEST_RATES
value: '20,40,60,80,100,120,140,160,180,200'
- name: BENCHMARK_TIME_SECONDS
value: '300'
- name: TOKENIZER
value: 'meta-llama/Llama-3.1-8B-Instruct'
- name: MODELS
value: 'adapter-0,adapter-1,adapter-2,adapter-3,adapter-4,adapter-5,adapter-6,adapter-7,adapter-8,adapter-9,adapter-10,adapter-11,adapter-12,adapter-13,adapter-14'
- name: TRAFFIC_SPLIT
value: '0.12,0.12,0.12,0.12,0.12,0.06,0.06,0.06,0.06,0.06,0.02,0.02,0.02,0.02,0.02'
- name: BACKEND
value: vllm
- name: PORT
value: "80"
- name: INPUT_LENGTH
value: "1024"
- name: OUTPUT_LENGTH
value: '1024'
- name: FILE_PREFIX
value: benchmark
- name: PROMPT_DATASET_FILE
value: Infinity-Instruct_conversations.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the file that's generated by the import_datasets.py? We will need to build this file to the lpg image in order to use it, right? Can we provide an lpg image with the datasets built-in?

Copy link
Contributor Author

@kaushikmitr kaushikmitr Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it but both these datasets require users to sign in to hf to accept an agreement. So I did not put it in the public image. We can have it internally though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, got it, then please replace the image with a placeholder <lpg_image>, and explain how to build a new image with this dataset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added, it seems easier is to make update to the LPG script and point to that here. Pushed my changes to import datasets and create a docker image in the LPG repo. Please try it out.

- name: HF_TOKEN
valueFrom:
secretKeyRef:
key: token
name: hf-token
resources:
limits:
cpu: "2"
memory: 20Gi
requests:
cpu: "2"
memory: 20Gi
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: benchmark-tool
name: benchmark-tool
spec:
replicas: 1
selector:
matchLabels:
app: benchmark-tool
template:
metadata:
labels:
app: benchmark-tool
spec:
containers:
# Build image from this source https://github.com/AI-Hypercomputer/inference-benchmark/tree/46d638262650a1928e47699d78ab2da062d4422d
- image: '<DOCKER_IMAGE>'
imagePullPolicy: Always
name: benchmark-tool
command:
- bash
- -c
- ./latency_throughput_curve.sh
env:
- name: IP
value: '<target-ip>'
- name: REQUEST_RATES
value: '300,310,320,330,340,350'
- name: BENCHMARK_TIME_SECONDS
value: '300'
- name: TOKENIZER
value: 'meta-llama/Llama-3.1-8B-Instruct'
- name: MODELS
value: 'meta-llama/Llama-3.1-8B-Instruct'
- name: BACKEND
value: vllm
- name: PORT
value: "80"
- name: INPUT_LENGTH
value: "1024"
- name: OUTPUT_LENGTH
value: '1024'
- name: FILE_PREFIX
value: benchmark
- name: PROMPT_DATASET_FILE
value: billsum_conversations.json
- name: HF_TOKEN
valueFrom:
secretKeyRef:
key: token
name: hf-token
resources:
limits:
cpu: "2"
memory: 20Gi
requests:
cpu: "2"
memory: 20Gi
Loading