-
Notifications
You must be signed in to change notification settings - Fork 69
Adding Design Principles #596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,52 @@ | ||||||
# Design Principles | ||||||
|
||||||
These principles guide our efforts to build flexible [Gateway API] extensions | ||||||
that empower the development of high-performance [AI Inference] routing | ||||||
technologies—balancing rapid delivery with long-term growth. | ||||||
|
||||||
!!! note "Inference Gateways" | ||||||
|
||||||
For simplicity, we'll refer to Gateway API Gateways which are | ||||||
composed together with AI Inference extensions as "Inference Gateways" | ||||||
throughout this document. | ||||||
|
||||||
[Gateway]:https://github.com/kubernetes-sigs/gateway-api | ||||||
[AI Inference]:https://www.arm.com/glossary/ai-inference | ||||||
|
||||||
|
||||||
## Prioritize stability of the core interfaces | ||||||
|
||||||
The most critical part of this project is the interfaces between components. To encourage both controller and extension developers to integrate with this project, we need to prioritize the stability of these interfaces. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. s/is/are/ There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🤔 I think The subject of the sentence is "the most critical part". The verb should agree with the singular noun "part". The phrase "of this project" and "the interfaces between components" are prepositional phrases that modify the subject but do not change its singularity.. |
||||||
Although we can extend these interfaces in the future, it’s critical the core is stable as soon as possible. | ||||||
|
||||||
When describing "core interfaces", we are referring to both of the following: | ||||||
|
||||||
### 1. Gateway -> Endpoint Picker | ||||||
At a high level, this defines how a Gateway provides information to an Endpoint Picker, and how the Endpoint Picker selects endpoint(s) that the Gateway should route to. | ||||||
|
||||||
### 2. Endpoint Picker -> Model Server Framework | ||||||
This defines what an Endpoint Picker should expect from a compatible Model Server Framework with a focus on health checks and metrics. | ||||||
|
||||||
|
||||||
## Our presets are finely tuned | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we can more clearly define this. We want extensibility, and customization. But I think it's very important that we have a turnkey solution that works for the average person. To word another way, I think good defaults/presets can fall under a larger umbrella of: We want a strong OOB experience for those who don't want to deeply customize. And our later points are about making this easily extensible and adaptable for those who do want to customize. Maybe that's implicit as a part of K8s. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
+1... I call this "batteries included" |
||||||
|
||||||
Our defaults—shaped by extensive experience with leading model serving platforms and APIs—are designed to provide the majority of AI Gateway users with a great default experience without the need for extensive configuration or customization. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We need to standardize the language in the project. From my understanding, we should use "inference gateway" instead of "AI gateway." We need to do the same with the EPP. For example, the docs refer to the EPP as the "Endpoint Selection Extension". I also refer to the EPP as ESE in kubernetes/website#49898. |
||||||
|
||||||
|
||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Delete L35 |
||||||
## Encourage innovation via extensibility | ||||||
|
||||||
This project is largely based on the idea that extensibility will enable innovation. With that in mind, we should make it as easy as possible for AI researchers to experiment with custom scheduling and routing logic. They should not need to know how to build a Kubernetes controller, or replicate a full networking stack. Instead, all the information needed to make a routing decision should be provided in an accessible format, with clear guidelines and examples of how to customize routing logic. | ||||||
Comment on lines
+36
to
+38
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Love it 👍 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Do you and others have ideas on how to achieve this goal? Should we make the scheduler extensible or define a scheduling API that EPP and potentially other extensions call? |
||||||
|
||||||
|
||||||
## Objectives over instructions | ||||||
|
||||||
The pace of innovation in this ecosystem has been rapid. Focusing too heavily on the specifics of current techniques could result in the API becoming outdated quickly. Instead of making the API too descriptive about _how _an objective should be achieved, this API should focus on the objectives that a Gateway and/or Endpoint Picker should strive to attain. Overly specific instructions or configuration can start as implementation specific APIs and grow into standards as the concepts become more stable and widespread. | ||||||
Comment on lines
+41
to
+43
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This section, more than any other section, as a "Design Principle" is not resonating with me just yet. To me this reads like it's trying to talk about scope control. Could you please help me to better understand the intent here, by providing a somewhat detailed example of a situation that could occur which would run counter to this principle? I think that would help me to better understand what it's trying to convey 🤔 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One example is configuration options for the scheduling algorithm itself, some of those configuration parameters may only be relevant to the current iteration of algorithm implementation. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Leave this as-is for now, and consider my suggestion resolved. I'll bring it up for a community call, doesn't need to hold up the PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It sounds like we should have a scheduler API with EPP consuming it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
||||||
|
||||||
## Composable components and reducing reinvention | ||||||
While it may be tempting to develop an entirely new AI-focused Gateway, many essential routing capabilities are already well established by Kubernetes. Our focus is on creating a layer of composable components that can be assembled together with other Kubernetes components. This approach empowers engineers to use our solution as a building block—combining established technologies like Gateway API with our extensible model to build higher level solutions. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Consider rewording, perhaps this is personal bias, but the first sentence reads as if to ward off the reader from attempting to implement something new. The later sentences focus on the value of using what has already been built, which I think is what we are going for. Perhaps move the concept of the first sentence to the end as something like:
|
||||||
|
||||||
|
||||||
## Additions to the API should be carefully prioritized | ||||||
|
||||||
Every addition to the API should take the principles described above into account. Given that the goal of the API is to encourage a highly extensible ecosystem, each additional feature in the API is raising the barrier for entry to any new controller or extension. Our top priority should be to focus on concepts that we expect to be broadly implementable and useful. The extensible nature of this API will allow each individual implementation to experiment with new features via custom flags or APIs before they become part of the core API surface. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/[Gateway]/[Gateway API]/