Skip to content

Commit 742e66e

Browse files
danehanskfswain
authored andcommitted
Fixes Kgateway in Quickstart Guide (kubernetes-sigs#616)
* Fixes Kgateway in Quickstart Guide Signed-off-by: Daneyon Hansen <[email protected]> * Moves HTTPRoutes to implementations Signed-off-by: Daneyon Hansen <[email protected]> * Bumps kgtw to rc.2 Signed-off-by: Daneyon Hansen <[email protected]> --------- Signed-off-by: Daneyon Hansen <[email protected]>
1 parent 62fc254 commit 742e66e

File tree

4 files changed

+79
-41
lines changed

4 files changed

+79
-41
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
apiVersion: gateway.networking.k8s.io/v1
2+
kind: HTTPRoute
3+
metadata:
4+
name: llm-route
5+
spec:
6+
parentRefs:
7+
- group: gateway.networking.k8s.io
8+
kind: Gateway
9+
name: inference-gateway
10+
rules:
11+
- backendRefs:
12+
- group: inference.networking.x-k8s.io
13+
kind: InferencePool
14+
name: vllm-llama3-8b-instruct
15+
port: 8000 # Remove when https://github.com/kgateway-dev/kgateway/issues/10987 is fixed.
16+
matches:
17+
- path:
18+
type: PathPrefix
19+
value: /
20+
timeouts:
21+
request: 300s

site-src/guides/index.md

+58-41
Original file line numberDiff line numberDiff line change
@@ -7,11 +7,12 @@
77
This quickstart guide is intended for engineers familiar with k8s and model servers (vLLM in this instance). The goal of this guide is to get an Inference Gateway up and running!
88

99
## **Prerequisites**
10-
- A cluster with:
11-
- Support for services of type `LoadBalancer`. (This can be validated by ensuring your Envoy Gateway is up and running).
12-
For example, with Kind, you can follow [these steps](https://kind.sigs.k8s.io/docs/user/loadbalancer).
13-
- Support for [sidecar containers](https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/) (enabled by default since Kubernetes v1.29)
14-
to run the model server deployment.
10+
11+
- A cluster with:
12+
- Support for services of type `LoadBalancer`. For kind clusters, follow [this guide](https://kind.sigs.k8s.io/docs/user/loadbalancer)
13+
to get services of type LoadBalancer working.
14+
- Support for [sidecar containers](https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/) (enabled by default since Kubernetes v1.29)
15+
to run the model server deployment.
1516

1617
## **Steps**
1718

@@ -105,6 +106,24 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
105106
inference-gateway inference-gateway <MY_ADDRESS> True 22s
106107
```
107108

109+
3. Deploy the HTTPRoute
110+
111+
```bash
112+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/httproute.yaml
113+
```
114+
115+
4. Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:
116+
117+
```bash
118+
kubectl get httproute llm-route -o yaml
119+
```
120+
121+
5. Given that the default connection timeout may be insufficient for most inference workloads, it is recommended to configure a timeout appropriate for your intended use case.
122+
123+
```bash
124+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/gcp-backend-policy.yaml
125+
```
126+
108127
=== "Istio"
109128

110129
Please note that this feature is currently in an experimental phase and is not intended for production use.
@@ -114,7 +133,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
114133

115134
- Gateway API [CRDs](https://gateway-api.sigs.k8s.io/guides/#installing-gateway-api) installed.
116135

117-
1. Install Istio
136+
2. Install Istio
118137
119138
```
120139
TAG=1.26-alpha.80c74f7f43482c226f4f4b10b4dda6261b67a71f
@@ -131,19 +150,19 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
131150
./istioctl install --set tag=$TAG --set hub=gcr.io/istio-testing
132151
```
133152

134-
1. If you run the Endpoint Picker (EPP) with the `--secureServing` flag set to `true` (the default mode), it is currently using a self-signed certificate. As a security measure, Istio does not trust self-signed certificates by default. As a temporary workaround, you can apply the destination rule to bypass TLS verification for EPP. A more secure TLS implementation in EPP is being discussed in [Issue 582](https://github.com/kubernetes-sigs/gateway-api-inference-extension/issues/582).
153+
3. If you run the Endpoint Picker (EPP) with the `--secureServing` flag set to `true` (the default mode), it is currently using a self-signed certificate. As a security measure, Istio does not trust self-signed certificates by default. As a temporary workaround, you can apply the destination rule to bypass TLS verification for EPP. A more secure TLS implementation in EPP is being discussed in [Issue 582](https://github.com/kubernetes-sigs/gateway-api-inference-extension/issues/582).
135154

136155
```bash
137156
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/istio/destination-rule.yaml
138157
```
139158

140-
1. Deploy Gateway
159+
4. Deploy Gateway
141160

142161
```bash
143162
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/istio/gateway.yaml
144163
```
145164

146-
1. Label the gateway
165+
5. Label the gateway
147166

148167
```bash
149168
kubectl label gateway llm-gateway istio.io/enable-inference-extproc=true
@@ -156,9 +175,21 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
156175
inference-gateway inference-gateway <MY_ADDRESS> True 22s
157176
```
158177

178+
6. Deploy the HTTPRoute
179+
180+
```bash
181+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/istio/httproute.yaml
182+
```
183+
184+
7. Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:
185+
186+
```bash
187+
kubectl get httproute llm-route -o yaml
188+
```
189+
159190
=== "Kgateway"
160191

161-
[Kgateway](https://kgateway.dev/) v2.0.0 adds support for inference extension as a **technical preview**. This means do not
192+
[Kgateway](https://kgateway.dev/) recently added support for inference extension as a **technical preview**. This means do not
162193
run Kgateway with inference extension in production environments. Refer to [Issue 10411](https://github.com/kgateway-dev/kgateway/issues/10411)
163194
for the list of caveats, supported features, etc.
164195

@@ -167,20 +198,20 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
167198
- [Helm](https://helm.sh/docs/intro/install/) installed.
168199
- Gateway API [CRDs](https://gateway-api.sigs.k8s.io/guides/#installing-gateway-api) installed.
169200

170-
1. Install Kgateway CRDs
201+
2. Set the Kgateway version and install the Kgateway CRDs.
171202

172203
```bash
173-
helm upgrade -i --create-namespace --namespace kgateway-system --version $VERSION kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds
204+
KGTW_VERSION=v2.0.0-rc.2
205+
helm upgrade -i --create-namespace --namespace kgateway-system --version $KGTW_VERSION kgateway-crds oci://cr.kgateway.dev/kgateway-dev/charts/kgateway-crds
174206
```
175207

176-
1. Install Kgateway
208+
3. Install Kgateway
177209

178210
```bash
179-
helm upgrade -i --namespace kgateway-system --version $VERSION kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway
180-
--set inferenceExtension.enabled=true
211+
helm upgrade -i --namespace kgateway-system --version $KGTW_VERSION kgateway oci://cr.kgateway.dev/kgateway-dev/charts/kgateway --set inferenceExtension.enabled=true
181212
```
182213

183-
1. Deploy Gateway
214+
4. Deploy the Gateway
184215

185216
```bash
186217
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/kgateway/gateway.yaml
@@ -193,33 +224,17 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
193224
inference-gateway kgateway <MY_ADDRESS> True 22s
194225
```
195226

196-
### Deploy the HTTPRoute
227+
5. Deploy the HTTPRoute
197228

198-
```bash
199-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/httproute.yaml
200-
```
201-
202-
### Configure Timeouts
203-
204-
Given that default timeouts for above implementations may be insufficient for most inference workloads, it is recommended to configure a timeout appropriate for your intended use case.
205-
206-
=== "GKE"
207-
208-
```bash
209-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/gcp-backend-policy.yaml
210-
```
211-
212-
=== "Istio"
229+
```bash
230+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/kgateway/httproute.yaml
231+
```
213232

214-
```bash
215-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/httproute-with-timeout.yaml
216-
```
233+
6. Confirm that the HTTPRoute status conditions include `Accepted=True` and `ResolvedRefs=True`:
217234

218-
=== "Kgateway"
219-
220-
```bash
221-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/httproute-with-timeout.yaml
222-
```
235+
```bash
236+
kubectl get httproute llm-route -o yaml
237+
```
223238

224239
### Try it out
225240

@@ -258,10 +273,12 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
258273
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/gateway.yaml --ignore-not-found
259274
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/healthcheck.yaml --ignore-not-found
260275
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/gcp-backend-policy.yaml --ignore-not-found
276+
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gke/httproute.yaml --ignore-not-found
261277
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/istio/gateway.yaml --ignore-not-found
262278
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/istio/destination-rule.yaml --ignore-not-found
279+
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/istio/httproute.yaml --ignore-not-found
263280
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/kgateway/gateway.yaml --ignore-not-found
264-
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/httproute.yaml --ignore-not-found
281+
kubectl delete -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/kgateway/httproute.yaml --ignore-not-found
265282
```
266283

267284
1. Uninstall the CRDs

0 commit comments

Comments
 (0)