Skip to content

Commit c82a114

Browse files
committed
Slight cleanup of some of our readmes (kubernetes-sigs#221)
* Slight cleanup of some of our readmes * testing site build issue * Adding a note that you need envoy gateway to work to use something that depends on envoy gateway * Feedback fixes * restructuring and feedback comments * removing make install
1 parent 02f42ee commit c82a114

11 files changed

+40
-29
lines changed

README.md

+3-15
Original file line numberDiff line numberDiff line change
@@ -8,25 +8,13 @@ This extension is intented to provide value to multiplexed LLM services on a sha
88

99
This project is currently in development.
1010

11-
For more rapid testing, our PoC is in the `./examples/` dir.
12-
13-
1411
## Getting Started
1512

16-
**Install the CRDs into the cluster:**
17-
18-
```sh
19-
make install
20-
```
21-
22-
**Delete the APIs(CRDs) from the cluster:**
13+
Follow this [README](./pkg/README.md) to get the inference-extension up and running on your cluster!
2314

24-
```sh
25-
make uninstall
26-
```
15+
## Website
2716

28-
**Deploying the ext-proc image**
29-
Refer to this [README](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/pkg/README.md) on how to deploy the Ext-Proc image.
17+
Detailed documentation is available on our website: https://gateway-api-inference-extension.sigs.k8s.io/
3018

3119
## Contributing
3220

examples/placeholder.md

Whitespace-only changes.

pkg/README.md

+31-14
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,11 @@
11
## Quickstart
22

3+
This quickstart guide is intended for engineers familiar with k8s and model servers (vLLM in this instance). The goal of this guide is to get a first, single InferencePool up and running!
4+
35
### Requirements
4-
The current manifests rely on Envoy Gateway [v1.2.1](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher.
6+
- Envoy Gateway [v1.2.1](https://gateway.envoyproxy.io/docs/install/install-yaml/#install-with-yaml) or higher
7+
- A cluster that has built-in support for `ServiceType=LoadBalancer`. (This can be validated by ensuring your Envoy Gateway is up and running)
8+
- For example, with Kind, you can follow these steps: https://kind.sigs.k8s.io/docs/user/loadbalancer
59

610
### Steps
711

@@ -11,30 +15,40 @@ The current manifests rely on Envoy Gateway [v1.2.1](https://gateway.envoyproxy.
1115
Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
1216
```bash
1317
kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2
14-
kubectl apply -f ../examples/poc/manifests/vllm/vllm-lora-deployment.yaml
18+
kubectl apply -f ./manifests/vllm/vllm-lora-deployment.yaml
19+
```
20+
21+
1. **Install the CRDs into the cluster:**
22+
23+
```sh
24+
kubectl apply -f config/crd/bases
1525
```
1626

1727
1. **Deploy InferenceModel and InferencePool**
1828

1929
Deploy a sample InferenceModel and InferencePool configuration based on the vLLM deployments mentioned above.
2030
```bash
21-
kubectl apply -f ../examples/poc/manifests/inferencepool-with-model.yaml
31+
kubectl apply -f ./manifests/inferencepool-with-model.yaml
2232
```
2333

2434
1. **Update Envoy Gateway Config to enable Patch Policy**
2535

2636
Our custom LLM Gateway ext-proc is patched into the existing envoy gateway via `EnvoyPatchPolicy`. To enable this feature, we must extend the Envoy Gateway config map. To do this, simply run:
2737
```bash
28-
kubectl apply -f ./manifests/enable_patch_policy.yaml
38+
kubectl apply -f ./manifests/gateway/enable_patch_policy.yaml
2939
kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system
3040
```
3141
Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again.
3242

3343
1. **Deploy Gateway**
3444

3545
```bash
36-
kubectl apply -f ./manifests/gateway.yaml
46+
kubectl apply -f ./manifests/gateway/gateway.yaml
3747
```
48+
> **_NOTE:_** This file couples together the gateway infra and the HTTPRoute infra for a convenient, quick startup. Creating additional/different InferencePools on the same gateway will require an additional set of: `Backend`, `HTTPRoute`, the resources included in the `./manifests/gateway/ext-proc.yaml` file, and an additional `./manifests/gateway/patch_policy.yaml` file. ***Should you choose to experiment, familiarity with xDS and Envoy are very useful.***
49+
50+
51+
3852

3953
1. **Deploy Ext-Proc**
4054

@@ -45,8 +59,17 @@ The current manifests rely on Envoy Gateway [v1.2.1](https://gateway.envoyproxy.
4559
1. **Deploy Envoy Gateway Custom Policies**
4660

4761
```bash
48-
kubectl apply -f ./manifests/extension_policy.yaml
49-
kubectl apply -f ./manifests/patch_policy.yaml
62+
kubectl apply -f ./manifests/gateway/extension_policy.yaml
63+
kubectl apply -f ./manifests/gateway/patch_policy.yaml
64+
```
65+
> **_NOTE:_** This is also per InferencePool, and will need to be configured to support the new pool should you wish to experiment further.
66+
67+
1. **OPTIONALLY**: Apply Traffic Policy
68+
69+
For high-traffic benchmarking you can apply this manifest to avoid any defaults that can cause timeouts/errors.
70+
71+
```bash
72+
kubectl apply -f ./manifests/gateway/traffic_policy.yaml
5073
```
5174

5275
1. **Try it out**
@@ -63,10 +86,4 @@ The current manifests rely on Envoy Gateway [v1.2.1](https://gateway.envoyproxy.
6386
"max_tokens": 100,
6487
"temperature": 0
6588
}'
66-
```
67-
68-
## Scheduling Package in Ext Proc
69-
The scheduling package implements request scheduling algorithms for load balancing requests across backend pods in an inference gateway. The scheduler ensures efficient resource utilization while maintaining low latency and prioritizing critical requests. It applies a series of filters based on metrics and heuristics to select the best pod for a given request.
70-
71-
# Flowchart
72-
<img src="../docs/schedular-flowchart.png" alt="Scheduling Algorithm" width="400" />
89+
```

pkg/manifests/enable_patch_policy.yaml renamed to pkg/manifests/gateway/enable_patch_policy.yaml

+1
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ metadata:
55
namespace: envoy-gateway-system
66
data:
77
# This manifest's main purpose is to set `enabledEnvoyPatchPolicy` to `true`.
8+
# This only needs to be ran once on your cluster (unless you'd like to change anything. i.e. enabling the admin dash)
89
# Any field under `admin` is optional, and only for enabling the admin endpoints, for debugging.
910
# Admin Interface: https://www.envoyproxy.io/docs/envoy/latest/operations/admin
1011
# PatchPolicy docs: https://gateway.envoyproxy.io/docs/tasks/extensibility/envoy-patch-policy/#enable-envoypatchpolicy
File renamed without changes.

pkg/scheduling.md

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
## Scheduling Package in Ext Proc
2+
The scheduling package implements request scheduling algorithms for load balancing requests across backend pods in an inference gateway. The scheduler ensures efficient resource utilization while maintaining low latency and prioritizing critical requests. It applies a series of filters based on metrics and heuristics to select the best pod for a given request.
3+
4+
# Flowchart
5+
<img src="../docs/schedular-flowchart.png" alt="Scheduling Algorithm" width="400" />

0 commit comments

Comments
 (0)