@@ -20,21 +20,21 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
20
20
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/vllm/deployment.yaml
21
21
```
22
22
23
- 1 . ** Install the Inference Extension CRDs:**
23
+ 2 . ** Install the Inference Extension CRDs:**
24
24
25
25
``` sh
26
26
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v0.1.0/manifests.yaml
27
27
```
28
28
29
- 1 . ** Deploy InferenceModel**
29
+ 3 . ** Deploy InferenceModel**
30
30
31
31
Deploy the sample InferenceModel which is configured to load balance traffic between the ` tweet-summary-0 ` and ` tweet-summary-1 `
32
32
[ LoRA adapters] ( https://docs.vllm.ai/en/latest/features/lora.html ) of the sample model server.
33
33
``` bash
34
34
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/inferencemodel.yaml
35
35
```
36
36
37
- 1 . ** Update Envoy Gateway Config to enable Patch Policy**
37
+ 4 . ** Update Envoy Gateway Config to enable Patch Policy**
38
38
39
39
Our custom LLM Gateway ext-proc is patched into the existing envoy gateway via ` EnvoyPatchPolicy ` . To enable this feature, we must extend the Envoy Gateway config map. To do this, simply run:
40
40
``` bash
@@ -43,7 +43,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
43
43
```
44
44
Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again.
45
45
46
- 1 . ** Deploy Gateway**
46
+ 5 . ** Deploy Gateway**
47
47
48
48
``` bash
49
49
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/gateway.yaml
@@ -57,29 +57,29 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
57
57
inference-gateway inference-gateway < MY_ADDRESS> True 22s
58
58
```
59
59
60
- 1 . ** Deploy the Inference Extension and InferencePool**
60
+ 6 . ** Deploy the Inference Extension and InferencePool**
61
61
62
62
``` bash
63
63
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/ext_proc.yaml
64
64
```
65
65
66
- 1 . ** Deploy Envoy Gateway Custom Policies**
66
+ 7 . ** Deploy Envoy Gateway Custom Policies**
67
67
68
68
``` bash
69
69
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/extension_policy.yaml
70
70
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/patch_policy.yaml
71
71
```
72
72
> ** _ NOTE:_ ** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further.
73
73
74
- 1 . ** OPTIONALLY** : Apply Traffic Policy
74
+ 8 . ** OPTIONALLY** : Apply Traffic Policy
75
75
76
76
For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors.
77
77
78
78
``` bash
79
79
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/traffic_policy.yaml
80
80
```
81
81
82
- 1 . ** Try it out**
82
+ 9 . ** Try it out**
83
83
84
84
Wait until the gateway is ready.
85
85
0 commit comments