Skip to content

Commit 9a2775d

Browse files
committed
Moving getting started guide to the site
1 parent 23a3171 commit 9a2775d

File tree

1 file changed

+94
-1
lines changed

1 file changed

+94
-1
lines changed

site-src/guides/index.md

Lines changed: 94 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,96 @@
11
# Getting started with Gateway API Inference Extension
22

3-
To get started using our project follow this guide [here](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/release-v0.1.0/pkg/README.md)!
3+
This quickstart guide is intended for engineers familiar with k8s and model servers (vLLM in this instance). The goal of this guide is to get a first, single InferencePool up and running!
4+
5+
### Requirements
6+
- Envoy Gateway [v1.2.1](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher
7+
- A cluster with:
8+
- Support for Services of type `LoadBalancer`. (This can be validated by ensuring your Envoy Gateway is up and running). For example, with Kind,
9+
you can follow [these steps](https://kind.sigs.k8s.io/docs/user/loadbalancer).
10+
- 3 GPUs to run the sample model server. Adjust the number of replicas in `./manifests/vllm/deployment.yaml` as needed.
11+
12+
### Steps
13+
14+
1. **Deploy Sample Model Server**
15+
16+
Create a Hugging Face secret to download the model [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf). Ensure that the token grants access to this model.
17+
Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
18+
```bash
19+
kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2
20+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/vllm/deployment.yaml
21+
```
22+
23+
1. **Install the Inference Extension CRDs:**
24+
25+
```sh
26+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/download/v0.1.0/manifests.yaml
27+
```
28+
29+
1. **Deploy InferenceModel**
30+
31+
Deploy the sample InferenceModel which is configured to load balance traffic between the `tweet-summary-0` and `tweet-summary-1`
32+
[LoRA adapters](https://docs.vllm.ai/en/latest/features/lora.html) of the sample model server.
33+
```bash
34+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/inferencemodel.yaml
35+
```
36+
37+
1. **Update Envoy Gateway Config to enable Patch Policy**
38+
39+
Our custom LLM Gateway ext-proc is patched into the existing envoy gateway via `EnvoyPatchPolicy`. To enable this feature, we must extend the Envoy Gateway config map. To do this, simply run:
40+
```bash
41+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/enable_patch_policy.yaml
42+
kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system
43+
```
44+
Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again.
45+
46+
1. **Deploy Gateway**
47+
48+
```bash
49+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/gateway.yaml
50+
```
51+
> **_NOTE:_** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: `Backend`, `HTTPRoute`, the resources included in the `./manifests/gateway/ext-proc.yaml` file, and an additional `./manifests/gateway/patch_policy.yaml` file. ***Should you choose to experiment, familiarity with xDS and Envoy are very useful.***
52+
53+
Confirm that the Gateway was assigned an IP address and reports a `Programmed=True` status:
54+
```bash
55+
$ kubectl get gateway inference-gateway
56+
NAME CLASS ADDRESS PROGRAMMED AGE
57+
inference-gateway inference-gateway <MY_ADDRESS> True 22s
58+
```
59+
60+
1. **Deploy the Inference Extension and InferencePool**
61+
62+
```bash
63+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/ext_proc.yaml
64+
```
65+
66+
1. **Deploy Envoy Gateway Custom Policies**
67+
68+
```bash
69+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/extension_policy.yaml
70+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/patch_policy.yaml
71+
```
72+
> **_NOTE:_** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further.
73+
74+
1. **OPTIONALLY**: Apply Traffic Policy
75+
76+
For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors.
77+
78+
```bash
79+
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/pkg/manifests/gateway/traffic_policy.yaml
80+
```
81+
82+
1. **Try it out**
83+
84+
Wait until the gateway is ready.
85+
86+
```bash
87+
IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}')
88+
PORT=8081
89+
90+
curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
91+
"model": "tweet-summary",
92+
"prompt": "Write as if you were a critic: San Francisco",
93+
"max_tokens": 100,
94+
"temperature": 0
95+
}'
96+
```

0 commit comments

Comments
 (0)