Skip to content

Commit 18be78e

Browse files
committed
Docs: Updates getting started guide for kgateway
Signed-off-by: Daneyon Hansen <[email protected]>
1 parent b7d35b6 commit 18be78e

File tree

2 files changed

+136
-37
lines changed

2 files changed

+136
-37
lines changed
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Requires Kgateway 2.0.0 or greater.
2+
---
3+
apiVersion: gateway.networking.k8s.io/v1
4+
kind: Gateway
5+
metadata:
6+
name: inference-gateway
7+
spec:
8+
gatewayClassName: kgateway
9+
listeners:
10+
- name: http
11+
protocol: HTTP
12+
port: 8080
13+
- name: llm-gw
14+
protocol: HTTP
15+
port: 8081
16+
---
17+
apiVersion: gateway.networking.k8s.io/v1
18+
kind: HTTPRoute
19+
metadata:
20+
name: llm-route
21+
spec:
22+
parentRefs:
23+
- group: gateway.networking.k8s.io
24+
kind: Gateway
25+
name: inference-gateway
26+
sectionName: llm-gw
27+
rules:
28+
- backendRefs:
29+
- group: inference.networking.x-k8s.io
30+
kind: InferencePool
31+
name: vllm-llama2-7b
32+
port: 8000
33+
weight: 1
34+
matches:
35+
- path:
36+
type: PathPrefix
37+
value: /
38+
timeouts:
39+
backendRequest: 24h
40+
request: 24h

site-src/guides/index.md

Lines changed: 96 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@
33
This quickstart guide is intended for engineers familiar with k8s and model servers (vLLM in this instance). The goal of this guide is to get a first, single InferencePool up and running!
44

55
## **Prerequisites**
6-
- Envoy Gateway [v1.3.0](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher
6+
77
- A cluster with:
8-
- Support for services of type `LoadBalancer`. (This can be validated by ensuring your Envoy Gateway is up and running).
9-
For example, with Kind, you can follow [these steps](https://kind.sigs.k8s.io/docs/user/loadbalancer).
10-
- Support for [sidecar containers](https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/) (enabled by default since Kubernetes v1.29)
11-
to run the model server deployment.
8+
- Support for services of type `LoadBalancer`. For example, with Kind, you can follow
9+
[these steps](https://kind.sigs.k8s.io/docs/user/loadbalancer).
10+
- Support for [sidecar containers](https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/)
11+
(enabled by default since Kubernetes v1.29) to run the model server deployment.
1212

1313
## **Steps**
1414

@@ -56,55 +56,114 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
5656
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/crd/bases/inference.networking.x-k8s.io_inferencepools.yaml
5757
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/crd/bases/inference.networking.x-k8s.io_inferencemodels.yaml
5858
```
59-
59+
6060
### Deploy InferenceModel
6161

6262
Deploy the sample InferenceModel which is configured to load balance traffic between the `tweet-summary-0` and `tweet-summary-1`
6363
[LoRA adapters](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
64+
6465
```bash
6566
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferencemodel.yaml
6667
```
6768

68-
### Update Envoy Gateway Config to enable Patch Policy**
69+
### Deploy Inference Gateway
6970

70-
Our custom LLM Gateway ext-proc is patched into the existing envoy gateway via `EnvoyPatchPolicy`. To enable this feature, we must extend the Envoy Gateway config map. To do this, simply run:
71-
```bash
72-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/enable_patch_policy.yaml
73-
kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system
74-
```
75-
Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again.
71+
Select one of the following tabs to deploy an Inference Gateway.
7672

77-
### Deploy Gateway
73+
=== "Envoy Gateway"
7874

79-
```bash
80-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gateway.yaml
81-
```
82-
> **_NOTE:_** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: `Backend`, `HTTPRoute`, the resources included in the `./config/manifests/gateway/ext-proc.yaml` file, and an additional `./config/manifests/gateway/patch_policy.yaml` file. ***Should you choose to experiment, familiarity with xDS and Envoy are very useful.***
75+
1. Requirements
8376

84-
Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status:
85-
```bash
86-
$ kubectl get gateway inference-gateway
87-
NAME CLASS ADDRESS PROGRAMMED AGE
88-
inference-gateway inference-gateway <MY_ADDRESS> True 22s
89-
```
90-
### Deploy the InferencePool and Extension
77+
- Envoy Gateway [v1.3.0](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher.
9178

92-
```bash
93-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferencepool.yaml
94-
```
95-
### Deploy Envoy Gateway Custom Policies
79+
1. Update Envoy Gateway Config to enable Patch Policy
9680

97-
```bash
98-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/patch_policy.yaml
99-
```
100-
> **_NOTE:_** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further.
101-
102-
### **OPTIONALLY**: Apply Traffic Policy
81+
Our custom LLM Gateway ext-proc is patched into the existing Envoy Gateway via `EnvoyPatchPolicy`. To enable this feature, we must extend the
82+
Envoy Gateway config map. To do this, apply the following manifest and restart Envoy Gateway:
83+
84+
```bash
85+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/enable_patch_policy.yaml
86+
kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system
87+
```
88+
89+
Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again.
90+
91+
1. Deploy GatewayClass, Gateway, Backend, and HTTPRoute resources
92+
93+
```bash
94+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gateway.yaml
95+
```
96+
97+
> **_NOTE:_** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: `Backend`, `HTTPRoute`, the resources included in the `./config/manifests/gateway/ext-proc.yaml` file, and an additional `./config/manifests/gateway/patch_policy.yaml` file. ***Should you choose to experiment, familiarity with xDS and Envoy are very useful.***
10398

104-
For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors.
99+
Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status:
100+
```bash
101+
$ kubectl get gateway inference-gateway
102+
NAME CLASS ADDRESS PROGRAMMED AGE
103+
inference-gateway inference-gateway <MY_ADDRESS> True 22s
104+
```
105+
106+
1. Deploy Envoy Gateway Custom Policies
107+
108+
```bash
109+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/patch_policy.yaml
110+
```
111+
112+
> **_NOTE:_** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further.
113+
114+
1. Apply Traffic Policy (Optional)
115+
116+
For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors.
117+
118+
```bash
119+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/traffic_policy.yaml
120+
```
121+
122+
=== "Kgateway"
123+
124+
[Kgateway](https://kgateway.dev/) v2.0.0 adds support for inference extension as a **technical preview**. This means do not
125+
run Kgateway with inference extension in production environments. Refer to [Issue 10411](https://github.com/kgateway-dev/kgateway/issues/10411)
126+
for the list of caveats, supported features, etc.
127+
128+
1. Requirements
129+
130+
- [Helm](https://helm.sh/docs/intro/install/) installed.
131+
- Gateway API [CRDs](https://gateway-api.sigs.k8s.io/guides/#installing-gateway-api) installed.
132+
133+
2. Install Kgateway CRDs
134+
135+
```bash
136+
helm upgrade -i --create-namespace --namespace kgateway-system --version 1.0.1-dev kgateway-crds https://github.com/danehans/toolbox/raw/refs/heads/main/charts/ddc488f033-kgateway-crds-1.0.1-dev.tgz
137+
```
138+
139+
3. Install Kgateway
140+
141+
```bash
142+
helm upgrade --install kgateway "https://github.com/danehans/toolbox/raw/refs/heads/main/charts/ddc488f033-kgateway-1.0.1-dev.tgz" \
143+
-n kgateway-system \
144+
--set image.registry=danehans \
145+
--set controller.image.pullPolicy=Always \
146+
--set inferenceExtension.enabled="true" \
147+
--version 1.0.1-dev
148+
```
149+
150+
4. Deploy Gateway and HTTPRoute resources
151+
152+
```bash
153+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/kgateway/resources.yaml
154+
```
155+
156+
Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status:
157+
```bash
158+
$ kubectl get gateway inference-gateway
159+
NAME CLASS ADDRESS PROGRAMMED AGE
160+
inference-gateway kgateway <MY_ADDRESS> True 22s
161+
```
162+
163+
### Deploy the InferencePool and Extension
105164

106165
```bash
107-
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/traffic_policy.yaml
166+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferencepool.yaml
108167
```
109168

110169
### Try it out

0 commit comments

Comments
 (0)