Draft a revised README.md

smarterclayton · web-flow · commit 3f1851aa47c4 · 2025-02-19T14:33:37.000-05:00
Clarify the point of the project, and use the vernacular of "inference gateway" vs "ai gateway" to more succinctly explain what the distinction is.  Move the website up more prominently, and describe in more detail what the immediate requirements are.  Create a stub roadmap section.
diff --git a/README.md b/README.md
@@ -1,24 +1,26 @@
 # Gateway API Inference Extension 
 
-The Gateway API Inference Extension came out of [wg-serving](https://github.com/kubernetes/community/tree/master/wg-serving) and is sponsored by [SIG Network](https://github.com/kubernetes/community/blob/master/sig-network/README.md#gateway-api-inference-extension). This repo contains: the load balancing algorithm, [ext-proc](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_proc_filter) code, CRDs, and controllers of the extension.
+The Gateway API Inference Extension - also known as an inference gateway - improves the tail latency and throughput of OpenAI completion requests when load balancing a group of LLM servers on Kubernetes with kv-cache awareness. It provides Kubernetes-native declarative APIs to route client model names to use-case specific LoRA adapters and control incremental rollout of new adapter versions, A/B traffic splitting, and safe blue-green base model and model server upgrades. By adding operational guardrails like priority and fairness to different client model names, the inference gateway allows a platform team to safely serve many different GenAI workloads on the same pool of shared foundation model servers for higher utilization and fewer required accelerators.
 
-This extension is intented to provide value to multiplexed LLM services on a shared pool of compute. See the [proposal](https://github.com/kubernetes-sigs/wg-serving/tree/main/proposals/012-llm-instance-gateway) for more info.
+The inference gateway is intended for inference platform teams serving self-hosted large language models on Kubernetes. It requires a version of vLLM that supports the necessary metrics to predict traffic.  It extends a cluster-local gateway supporting [ext-proc](https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/ext_proc_filter) such as Envoy Gateway, kGateway, or the GKE Gateway. The HttpRoute that accepts OpenAI-compatible requests and serves model responses can then be configured as a model provider underneath a higher level AI-Gateway like LiteLLM, Solo AI Gateway, or Apigee, allowing you to integrate local serving with model-as-a-service consumption.
+
+See our website at https://gateway-api-inference-extension.sigs.k8s.io/ for detailed API documentation.
 
 ## Status
 
-This project is currently in development. 
+This project is currently under development and we have released our first [alpha 0.1 release](https://github.com/kubernetes-sigs/gateway-api-inference-extension/releases/tag/v0.1.0).  It should not be used in production.
 
 ## Getting Started
 
 Follow this [README](./pkg/README.md) to get the inference-extension up and running on your cluster!
 
-## End-to-End Tests
+## Roadmap
 
-Follow this [README](./test/e2e/README.md) to learn more about running the inference-extension end-to-end test suite on your cluster.
+Coming soon!
 
-## Website
+## End-to-End Tests
 
-Detailed documentation is available on our website: https://gateway-api-inference-extension.sigs.k8s.io/
+Follow this [README](./test/e2e/README.md) to learn more about running the inference-extension end-to-end test suite on your cluster.
 
 ## Contributing