|
3 | 3 | This quickstart guide is intended for engineers familiar with k8s and model servers (vLLM in this instance). The goal of this guide is to get a first, single InferencePool up and running!
|
4 | 4 |
|
5 | 5 | ## **Prerequisites**
|
6 |
| - - Envoy Gateway [v1.3.0](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher |
| 6 | + |
7 | 7 | - A cluster with:
|
8 |
| - - Support for services of type `LoadBalancer`. (This can be validated by ensuring your Envoy Gateway is up and running). |
9 |
| - For example, with Kind, you can follow [these steps](https://kind.sigs.k8s.io/docs/user/loadbalancer). |
10 |
| - - Support for [sidecar containers](https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/) (enabled by default since Kubernetes v1.29) |
11 |
| - to run the model server deployment. |
| 8 | + - Support for services of type `LoadBalancer`. For example, with Kind, you can follow |
| 9 | + [these steps](https://kind.sigs.k8s.io/docs/user/loadbalancer). |
| 10 | + - Support for [sidecar containers](https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/) |
| 11 | + (enabled by default since Kubernetes v1.29) to run the model server deployment. |
12 | 12 |
|
13 | 13 | ## **Steps**
|
14 | 14 |
|
@@ -56,55 +56,114 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
|
56 | 56 | kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/crd/bases/inference.networking.x-k8s.io_inferencepools.yaml
|
57 | 57 | kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/crd/bases/inference.networking.x-k8s.io_inferencemodels.yaml
|
58 | 58 | ```
|
59 |
| - |
| 59 | + |
60 | 60 | ### Deploy InferenceModel
|
61 | 61 |
|
62 | 62 | Deploy the sample InferenceModel which is configured to load balance traffic between the `tweet-summary-0` and `tweet-summary-1`
|
63 | 63 | [LoRA adapters](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
|
| 64 | + |
64 | 65 | ```bash
|
65 | 66 | kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferencemodel.yaml
|
66 | 67 | ```
|
67 | 68 |
|
68 |
| -### Update Envoy Gateway Config to enable Patch Policy** |
| 69 | +### Deploy Inference Gateway |
69 | 70 |
|
70 |
| - Our custom LLM Gateway ext-proc is patched into the existing envoy gateway via `EnvoyPatchPolicy`. To enable this feature, we must extend the Envoy Gateway config map. To do this, simply run: |
71 |
| - ```bash |
72 |
| - kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/enable_patch_policy.yaml |
73 |
| - kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system |
74 |
| - ``` |
75 |
| - Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again. |
| 71 | + Select one of the following tabs to deploy an Inference Gateway. |
76 | 72 |
|
77 |
| -### Deploy Gateway |
| 73 | +=== "Envoy Gateway" |
78 | 74 |
|
79 |
| - ```bash |
80 |
| - kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gateway.yaml |
81 |
| - ``` |
82 |
| - > **_NOTE:_** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: `Backend`, `HTTPRoute`, the resources included in the `./config/manifests/gateway/ext-proc.yaml` file, and an additional `./config/manifests/gateway/patch_policy.yaml` file. ***Should you choose to experiment, familiarity with xDS and Envoy are very useful.*** |
| 75 | + 1. Requirements |
83 | 76 |
|
84 |
| - Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status: |
85 |
| - ```bash |
86 |
| - $ kubectl get gateway inference-gateway |
87 |
| - NAME CLASS ADDRESS PROGRAMMED AGE |
88 |
| - inference-gateway inference-gateway <MY_ADDRESS> True 22s |
89 |
| - ``` |
90 |
| -### Deploy the InferencePool and Extension |
| 77 | + - Envoy Gateway [v1.3.0](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher. |
91 | 78 |
|
92 |
| - ```bash |
93 |
| - kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferencepool.yaml |
94 |
| - ``` |
95 |
| -### Deploy Envoy Gateway Custom Policies |
| 79 | + 1. Update Envoy Gateway Config to enable Patch Policy |
96 | 80 |
|
97 |
| - ```bash |
98 |
| - kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/patch_policy.yaml |
99 |
| - ``` |
100 |
| - > **_NOTE:_** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further. |
101 |
| - |
102 |
| -### **OPTIONALLY**: Apply Traffic Policy |
| 81 | + Our custom LLM Gateway ext-proc is patched into the existing Envoy Gateway via `EnvoyPatchPolicy`. To enable this feature, we must extend the |
| 82 | + Envoy Gateway config map. To do this, apply the following manifest and restart Envoy Gateway: |
| 83 | + |
| 84 | + ```bash |
| 85 | + kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/enable_patch_policy.yaml |
| 86 | + kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system |
| 87 | + ``` |
| 88 | + |
| 89 | + Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again. |
| 90 | + |
| 91 | + 1. Deploy GatewayClass, Gateway, Backend, and HTTPRoute resources |
| 92 | + |
| 93 | + ```bash |
| 94 | + kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/gateway.yaml |
| 95 | + ``` |
| 96 | + |
| 97 | + > **_NOTE:_** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: `Backend`, `HTTPRoute`, the resources included in the `./config/manifests/gateway/ext-proc.yaml` file, and an additional `./config/manifests/gateway/patch_policy.yaml` file. ***Should you choose to experiment, familiarity with xDS and Envoy are very useful.*** |
103 | 98 |
|
104 |
| - For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors. |
| 99 | + Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status: |
| 100 | + ```bash |
| 101 | + $ kubectl get gateway inference-gateway |
| 102 | + NAME CLASS ADDRESS PROGRAMMED AGE |
| 103 | + inference-gateway inference-gateway <MY_ADDRESS> True 22s |
| 104 | + ``` |
| 105 | + |
| 106 | + 1. Deploy Envoy Gateway Custom Policies |
| 107 | + |
| 108 | + ```bash |
| 109 | + kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/patch_policy.yaml |
| 110 | + ``` |
| 111 | + |
| 112 | + > **_NOTE:_** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further. |
| 113 | + |
| 114 | + 1. Apply Traffic Policy (Optional) |
| 115 | + |
| 116 | + For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors. |
| 117 | + |
| 118 | + ```bash |
| 119 | + kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/traffic_policy.yaml |
| 120 | + ``` |
| 121 | + |
| 122 | +=== "Kgateway" |
| 123 | + |
| 124 | + [Kgateway](https://kgateway.dev/) v2.0.0 adds support for inference extension as a **technical preview**. This means do not |
| 125 | + run Kgateway with inference extension in production environments. Refer to [Issue 10411](https://github.com/kgateway-dev/kgateway/issues/10411) |
| 126 | + for the list of caveats, supported features, etc. |
| 127 | + |
| 128 | + 1. Requirements |
| 129 | + |
| 130 | + - [Helm](https://helm.sh/docs/intro/install/) installed. |
| 131 | + - Gateway API [CRDs](https://gateway-api.sigs.k8s.io/guides/#installing-gateway-api) installed. |
| 132 | + |
| 133 | + 2. Install Kgateway CRDs |
| 134 | + |
| 135 | + ```bash |
| 136 | + helm upgrade -i --create-namespace --namespace kgateway-system --version 1.0.1-dev kgateway-crds https://github.com/danehans/toolbox/raw/refs/heads/main/charts/ddc488f033-kgateway-crds-1.0.1-dev.tgz |
| 137 | + ``` |
| 138 | + |
| 139 | + 3. Install Kgateway |
| 140 | + |
| 141 | + ```bash |
| 142 | + helm upgrade --install kgateway "https://github.com/danehans/toolbox/raw/refs/heads/main/charts/ddc488f033-kgateway-1.0.1-dev.tgz" \ |
| 143 | + -n kgateway-system \ |
| 144 | + --set image.registry=danehans \ |
| 145 | + --set controller.image.pullPolicy=Always \ |
| 146 | + --set inferenceExtension.enabled="true" \ |
| 147 | + --version 1.0.1-dev |
| 148 | + ``` |
| 149 | + |
| 150 | + 4. Deploy Gateway and HTTPRoute resources |
| 151 | + |
| 152 | + ```bash |
| 153 | + kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/kgateway/resources.yaml |
| 154 | + ``` |
| 155 | + |
| 156 | + Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status: |
| 157 | + ```bash |
| 158 | + $ kubectl get gateway inference-gateway |
| 159 | + NAME CLASS ADDRESS PROGRAMMED AGE |
| 160 | + inference-gateway kgateway <MY_ADDRESS> True 22s |
| 161 | + ``` |
| 162 | + |
| 163 | +### Deploy the InferencePool and Extension |
105 | 164 |
|
106 | 165 | ```bash
|
107 |
| - kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/gateway/traffic_policy.yaml |
| 166 | + kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/inferencepool.yaml |
108 | 167 | ```
|
109 | 168 |
|
110 | 169 | ### Try it out
|
|
0 commit comments