Skip to content

Commit 3f1851a

Browse files
Draft a revised README.md
Clarify the point of the project, and use the vernacular of "inference gateway" vs "ai gateway" to more succinctly explain what the distinction is. Move the website up more prominently, and describe in more detail what the immediate requirements are. Create a stub roadmap section.
1 parent 6130ee0 commit 3f1851a

File tree

1 file changed

+9
-7
lines changed

1 file changed

+9
-7
lines changed

README.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,26 @@
11
# Gateway API Inference Extension
22

3-
The Gateway API Inference Extension came out of [wg-serving](https://github.com/kubernetes/community/tree/master/wg-serving) and is sponsored by [SIG Network](https://github.com/kubernetes/community/blob/master/sig-network/README.md#gateway-api-inference-extension). This repo contains: the load balancing algorithm, [ext-proc](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_proc_filter) code, CRDs, and controllers of the extension.
3+
The Gateway API Inference Extension - also known as an inference gateway - improves the tail latency and throughput of OpenAI completion requests when load balancing a group of LLM servers on Kubernetes with kv-cache awareness. It provides Kubernetes-native declarative APIs to route client model names to use-case specific LoRA adapters and control incremental rollout of new adapter versions, A/B traffic splitting, and safe blue-green base model and model server upgrades. By adding operational guardrails like priority and fairness to different client model names, the inference gateway allows a platform team to safely serve many different GenAI workloads on the same pool of shared foundation model servers for higher utilization and fewer required accelerators.
44

5-
This extension is intented to provide value to multiplexed LLM services on a shared pool of compute. See the [proposal](https://github.com/kubernetes-sigs/wg-serving/tree/main/proposals/012-llm-instance-gateway) for more info.
5+
The inference gateway is intended for inference platform teams serving self-hosted large language models on Kubernetes. It requires a version of vLLM that supports the necessary metrics to predict traffic. It extends a cluster-local gateway supporting [ext-proc](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_proc_filter) such as Envoy Gateway, kGateway, or the GKE Gateway. The HttpRoute that accepts OpenAI-compatible requests and serves model responses can then be configured as a model provider underneath a higher level AI-Gateway like LiteLLM, Solo AI Gateway, or Apigee, allowing you to integrate local serving with model-as-a-service consumption.
6+
7+
See our website at https://gateway-api-inference-extension.sigs.k8s.io/ for detailed API documentation.
68

79
## Status
810

9-
This project is currently in development.
11+
This project is currently under development and we have released our first [alpha 0.1 release](https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/tag/v0.1.0). It should not be used in production.
1012

1113
## Getting Started
1214

1315
Follow this [README](./pkg/README.md) to get the inference-extension up and running on your cluster!
1416

15-
## End-to-End Tests
17+
## Roadmap
1618

17-
Follow this [README](./test/e2e/README.md) to learn more about running the inference-extension end-to-end test suite on your cluster.
19+
Coming soon!
1820

19-
## Website
21+
## End-to-End Tests
2022

21-
Detailed documentation is available on our website: https://gateway-api-inference-extension.sigs.k8s.io/
23+
Follow this [README](./test/e2e/README.md) to learn more about running the inference-extension end-to-end test suite on your cluster.
2224

2325
## Contributing
2426

0 commit comments

Comments
 (0)