Skip to content

Commit 5c3a778

Browse files
ahg-gkfswain
authored andcommitted
Added provider support to InferencePool helm chart (kubernetes-sigs#595)
* Added provider support to InferencePool helm chart * Removed the redundant pool name flag
1 parent 5826cec commit 5c3a778

File tree

8 files changed

+81
-28
lines changed

8 files changed

+81
-28
lines changed

config/charts/inferencepool/README.md

+11-17
Original file line numberDiff line numberDiff line change
@@ -9,20 +9,14 @@ To install an InferencePool named `vllm-llama3-8b-instruct` that selects from e
99

1010
```txt
1111
$ helm install vllm-llama3-8b-instruct ./config/charts/inferencepool \
12-
--set inferencePool.name=vllm-llama3-8b-instruct \
1312
--set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
14-
--set inferencePool.targetPortNumber=8000
1513
```
1614

17-
where `inferencePool.targetPortNumber` is the pod that vllm backends served on and `inferencePool.modelServers.matchLabels` is the selector to match the vllm backends.
18-
1915
To install via the latest published chart in staging (--version v0 indicates latest dev version), you can run the following command:
2016

2117
```txt
2218
$ helm install vllm-llama3-8b-instruct \
23-
--set inferencePool.name=vllm-llama3-8b-instruct \
2419
--set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \
25-
--set inferencePool.targetPortNumber=8000 \
2620
oci://us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/charts/inferencepool --version v0
2721
```
2822

@@ -38,17 +32,17 @@ $ helm uninstall pool-1
3832

3933
The following table list the configurable parameters of the chart.
4034

41-
| **Parameter Name** | **Description** |
42-
|---------------------------------------------|-------------------------------------------------------------------------------------------------------------------|
43-
| `inferencePool.name` | Name for the InferencePool, and inference extension will be named as `${inferencePool.name}-epp`. |
44-
| `inferencePool.targetPortNumber` | Target port number for the vllm backends, will be used to scrape metrics by the inference extension. |
45-
| `inferencePool.modelServers.matchLabels` | Label selector to match vllm backends managed by the inference pool. |
46-
| `inferenceExtension.replicas` | Number of replicas for the inference extension service. Defaults to `1`. |
47-
| `inferenceExtension.image.name` | Name of the container image used for the inference extension. |
48-
| `inferenceExtension.image.hub` | Registry URL where the inference extension image is hosted. |
49-
| `inferenceExtension.image.tag` | Image tag of the inference extension. |
50-
| `inferenceExtension.image.pullPolicy` | Image pull policy for the container. Possible values: `Always`, `IfNotPresent`, or `Never`. Defaults to `Always`. |
51-
| `inferenceExtension.extProcPort` | Port where the inference extension service is served for external processing. Defaults to `9002`. |
35+
| **Parameter Name** | **Description** |
36+
|---------------------------------------------|------------------------------------------------------------------------------------------------------------------------|
37+
| `inferencePool.name` | Name for the InferencePool, and endpoint picker deployment and service will be named as `{.Release.name}-epp`. |
38+
| `inferencePool.targetPortNumber` | Target port number for the vllm backends, will be used to scrape metrics by the inference extension. Defaults to 8000. |
39+
| `inferencePool.modelServers.matchLabels` | Label selector to match vllm backends managed by the inference pool. |
40+
| `inferenceExtension.replicas` | Number of replicas for the endpoint picker extension service. Defaults to `1`. |
41+
| `inferenceExtension.image.name` | Name of the container image used for the endpoint picker. |
42+
| `inferenceExtension.image.hub` | Registry URL where the endpoint picker image is hosted. |
43+
| `inferenceExtension.image.tag` | Image tag of the endpoint picker. |
44+
| `inferenceExtension.image.pullPolicy` | Image pull policy for the container. Possible values: `Always`, `IfNotPresent`, or `Never`. Defaults to `Always`. |
45+
| `inferenceExtension.extProcPort` | Port where the endpoint picker service is served for external processing. Defaults to `9002`. |
5246

5347
## Notes
5448

Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
InferencePool {{ .Values.inferencePool.name }} deployed.
1+
InferencePool {{ .Release.Name }} deployed.

config/charts/inferencepool/templates/_helpers.tpl

+2-2
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,13 @@ app.kubernetes.io/version: {{ .Chart.AppVersion | quote }}
1212
Inference extension name
1313
*/}}
1414
{{- define "gateway-api-inference-extension.name" -}}
15-
{{- $base := .Values.inferencePool.name | default "default-pool" | lower | trim | trunc 40 -}}
15+
{{- $base := .Release.Name | default "default-pool" | lower | trim | trunc 40 -}}
1616
{{ $base }}-epp
1717
{{- end -}}
1818

1919
{{/*
2020
Selector labels
2121
*/}}
2222
{{- define "gateway-api-inference-extension.selectorLabels" -}}
23-
app: {{ include "gateway-api-inference-extension.name" . }}
23+
inferencepool: {{ include "gateway-api-inference-extension.name" . }}
2424
{{- end -}}

config/charts/inferencepool/templates/_validations.tpl

-5
Original file line numberDiff line numberDiff line change
@@ -2,11 +2,6 @@
22
common validations
33
*/}}
44
{{- define "gateway-api-inference-extension.validations.inferencepool.common" -}}
5-
{{- if not $.Values.inferencePool.name }}
6-
{{- fail "missing .Values.inferencePool.name" }}
7-
{{- end }}
8-
9-
105
{{- if or (empty $.Values.inferencePool.modelServers) (not $.Values.inferencePool.modelServers.matchLabels) }}
116
{{- fail ".Values.inferencePool.modelServers.matchLabels is required" }}
127
{{- end }}

config/charts/inferencepool/templates/epp-deployment.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ spec:
2222
imagePullPolicy: {{ .Values.inferenceExtension.image.pullPolicy | default "Always" }}
2323
args:
2424
- -poolName
25-
- {{ .Values.inferencePool.name }}
25+
- {{ .Release.Name }}
2626
- -poolNamespace
2727
- {{ .Release.Namespace }}
2828
- -v
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
{{- if eq .Values.provider.name "gke" }}
2+
---
3+
kind: HealthCheckPolicy
4+
apiVersion: networking.gke.io/v1
5+
metadata:
6+
name: {{ .Release.Name }}
7+
namespace: {{ .Release.Namespace }}
8+
labels:
9+
{{- include "gateway-api-inference-extension.labels" . | nindent 4 }}
10+
spec:
11+
targetRef:
12+
group: "inference.networking.x-k8s.io"
13+
kind: InferencePool
14+
name: {{ .Release.Name }}
15+
default:
16+
config:
17+
type: HTTP
18+
httpHealthCheck:
19+
requestPath: /health
20+
port: {{ .Values.inferencePool.targetPortNumber }}
21+
---
22+
apiVersion: networking.gke.io/v1
23+
kind: GCPBackendPolicy
24+
metadata:
25+
name: {{ .Release.Name }}
26+
namespace: {{ .Release.Namespace }}
27+
labels:
28+
{{- include "gateway-api-inference-extension.labels" . | nindent 4 }}
29+
spec:
30+
targetRef:
31+
group: "inference.networking.x-k8s.io"
32+
kind: InferencePool
33+
name: {{ .Release.Name }}
34+
default:
35+
timeoutSec: 300 # 5-minute timeout (adjust as needed)
36+
---
37+
apiVersion: monitoring.googleapis.com/v1
38+
kind: ClusterPodMonitoring
39+
metadata:
40+
name: {{ .Release.Namespace }}-{{ .Release.Name }}
41+
labels:
42+
{{- include "gateway-api-inference-extension.labels" . | nindent 4 }}
43+
spec:
44+
endpoints:
45+
- port: metrics
46+
scheme: http
47+
interval: 5s
48+
path: /metrics
49+
authorization:
50+
type: Bearer
51+
credentials:
52+
secret:
53+
name: {{ .Values.gke.monitoringSecret }}
54+
key: token
55+
namespace: {{ .Release.Namespace }}
56+
selector:
57+
matchLabels:
58+
{{- include "gateway-api-inference-extension.labels" . | nindent 8 }}
59+
{{- end }}

config/charts/inferencepool/templates/inferencepool.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
apiVersion: inference.networking.x-k8s.io/v1alpha2
33
kind: InferencePool
44
metadata:
5-
name: {{ .Values.inferencePool.name }}
5+
name: {{ .Release.Name }}
66
namespace: {{ .Release.Namespace }}
77
labels:
88
{{- include "gateway-api-inference-extension.labels" . | nindent 4 }}

config/charts/inferencepool/values.yaml

+6-1
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,13 @@ inferenceExtension:
88
extProcPort: 9002
99

1010
inferencePool:
11-
# name: pool-1 # REQUIRED
1211
targetPortNumber: 8000
1312
# modelServers: # REQUIRED
1413
# matchLabels:
1514
# app: vllm-llama3-8b-instruct
15+
16+
provider:
17+
name: none
18+
19+
gke:
20+
monitoringSecret: inference-gateway-sa-metrics-reader-secret

0 commit comments

Comments
 (0)