Skip to content

Commit 7e51e02

Browse files
committed
add regression testing docs
s traffic split setup s traffic split setup s requirement s regressing testig doc s performance docs add newline add example yamls for multi lora deployment and regression lgp testing fix qps range fix typo fix typo fix typo fix typo fix typo fix typo fix typo fix broken link add instructions to build lpg image update benchmark.yaml update lpg yamls update readme update regfression testing markdown to refine docker image creating for LPG update regression yamls refine regression doc
1 parent a78d80f commit 7e51e02

File tree

2 files changed

+64
-0
lines changed

2 files changed

+64
-0
lines changed

config/manifests/regression-testing/multi-lora-regression.yaml

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -60,3 +60,66 @@ spec:
6060
requests:
6161
cpu: "2"
6262
memory: 20Gi
63+
64+
apiVersion: apps/v1
65+
kind: Deployment
66+
metadata:
67+
labels:
68+
app: benchmark-tool
69+
name: benchmark-tool
70+
spec:
71+
replicas: 1
72+
selector:
73+
matchLabels:
74+
app: benchmark-tool
75+
template:
76+
metadata:
77+
labels:
78+
app: benchmark-tool
79+
spec:
80+
containers:
81+
# Build image from this source https://github.com/AI-Hypercomputer/inference-benchmark/blob/1c92df607751a7ddb04e2152ed7f6aaf85bd9ca7
82+
- image: '<DOCKER_IMAGE>'
83+
imagePullPolicy: Always
84+
name: benchmark-tool
85+
command:
86+
- bash
87+
- -c
88+
- ./latency_throughput_curve.sh
89+
env:
90+
- name: IP
91+
value: '<target-ip>'
92+
- name: REQUEST_RATES
93+
value: '20,40,60,80,100,120,140,160,180,200'
94+
- name: BENCHMARK_TIME_SECONDS
95+
value: '300'
96+
- name: TOKENIZER
97+
value: 'meta-llama/Llama-3.1-8B-Instruct'
98+
- name: MODELS
99+
value: 'adapter-0,adapter-1,adapter-2,adapter-3,adapter-4,adapter-5,adapter-6,adapter-7,adapter-8,adapter-9,adapter-10,adapter-11,adapter-12,adapter-13,adapter-14'
100+
- name: TRAFFIC_SPLIT
101+
value: '0.12,0.12,0.12,0.12,0.12,0.06,0.06,0.06,0.06,0.06,0.02,0.02,0.02,0.02,0.02'
102+
- name: BACKEND
103+
value: vllm
104+
- name: PORT
105+
value: "80"
106+
- name: INPUT_LENGTH
107+
value: "1024"
108+
- name: OUTPUT_LENGTH
109+
value: '1024'
110+
- name: FILE_PREFIX
111+
value: benchmark
112+
- name: PROMPT_DATASET_FILE
113+
value: Infinity-Instruct_conversations.json
114+
- name: HF_TOKEN
115+
valueFrom:
116+
secretKeyRef:
117+
key: token
118+
name: hf-token
119+
resources:
120+
limits:
121+
cpu: "2"
122+
memory: 20Gi
123+
requests:
124+
cpu: "2"
125+
memory: 20Gi

config/manifests/regression-testing/single-workload-regression.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,3 +58,4 @@ spec:
5858
requests:
5959
cpu: "2"
6060
memory: 20Gi
61+

0 commit comments

Comments
 (0)