Skip to content

Commit 5f489b9

Browse files
kfswainelevranahg-g
authored
Initial Scheduler Subsystem interface (#845)
* initial sketching of interfacing * Update docs/proposals/0845-scheduler-architecture-proposal/README.md Co-authored-by: Etai Lev Ran <[email protected]> * Apply suggestions from code review * Update docs/proposals/0845-scheduler-architecture-proposal/README.md --------- Co-authored-by: Etai Lev Ran <[email protected]> Co-authored-by: Abdullah Gharaibeh <[email protected]>
1 parent a958297 commit 5f489b9

File tree

5 files changed

+216
-26
lines changed

5 files changed

+216
-26
lines changed

docs/proposals/0683-epp-architecture-proposal/README.md

Lines changed: 1 addition & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,30 +1,9 @@
1-
# Gateway API Inference Extension
1+
# EPP Architecture Proposal
22

33
Author(s): @kfswain
44
## Proposal Status
55
***Draft***
66

7-
## Table of Contents
8-
9-
<!-- toc -->
10-
11-
- [Summary](#summary)
12-
- [Goals](#goals)
13-
- [Non-Goals](#non-goals)
14-
- [Proposal](#proposal)
15-
- [Personas](#personas)
16-
- [Inference Platform Admin](#inference-platform-admin)
17-
- [Inference Workload Owner](#workload-owner)
18-
- [Axioms](#axioms)
19-
- [InferencePool](#inferencepool)
20-
- [InferenceModel](#inferencemodel)
21-
- [Spec](#spec)
22-
- [Diagrams](#diagrams)
23-
- [Alternatives](#alternatives)
24-
- [Open Questions](#open-questions)
25-
26-
<!-- /toc -->
27-
287
## Summary
298

309
This proposal seeks to standardize the implementation of an EPP (End-point Picker) for the Inference Gateway extension (also known as Gateway API Inference Extension). Additionally, this proposes to restructure the current implementation of the EPP to be more modular, and approachable.
@@ -86,11 +65,7 @@ Due to the possibility of this becoming a bit of a dumping ground. The API will
8665

8766
The flow controller will consume resource regime data, and enforce proper resource sharing between workloads. This will primarily be done through a queuing mechanism [as described here](https://docs.google.com/document/d/1VZL7opFWuwgWquvgiOzLlXAJ633qZ9U-A0ZixGjBgaI/edit?usp=sharing).
8867

89-
#### Scheduling Layer
90-
91-
As the Scheduling Layer is the final interface to the entirety of the pool, all configuration will be at the _pool_ level. The default scheduling layer will be an experimentally-backed LB algorithm, with exposed config values.
9268

93-
The Scheduler will define a strong interface API, so that new scheduling algos may be plugged & dark-launched to test in production traffic without impacting said traffic. Extension is expected to adhere to the [Scheduler Subsystem definition](https://github.com/kubernetes-sigs/gateway-api-inference-extension/pull/603)
9469

9570
### `Non-extensible`
9671

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# Scheduling Subsystem Architecture
2+
3+
Author(s): @kfswain, @ahg-g, @nirrozenbaum
4+
## Proposal Status
5+
***Draft***
6+
7+
## Summary
8+
The Scheduling Subsystem is a framework used to implement scheduling algorithms. High level definition [here](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/006-scheduler) & EPP Architecture [here](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/0683-epp-architecture-proposal).
9+
10+
## Design Principles
11+
- The scheduler framework should act as an independent library, there should be no dependency on EPP packages defined outside of the scheduler
12+
- The *framework* should be agnostic to web protocols(such as HTTP), endpoint types (such as model servers), and K8s concepts.
13+
- Opinions should be held by the plugins, not the framework
14+
- The entry & exit points should be defined by the framework, acting as the API surface of the system
15+
- Multiple scheduling 'profiles' should be able to be ran for a single request.
16+
- They can be conditionally dependent on previous runs, or in parallel
17+
- Plugin state is managed by the plugin itself
18+
19+
## Definitions
20+
- **Scheduling Framework** - The system created to allow for a pluggable scheduling algorithm.
21+
- **Scheduling Profile** - A named, specific set of Filter(s), Scorer(s), & Picker used to select endpoints.
22+
- **Scheduler** - An extensible implementation of a scheduling algorithm. Including logic to select Scheduling Profiles, the Scheduling Profiles themselves, & logic to interpret the result.
23+
- **Scheduling Cycle** - A single run of a Scheduler through the Scheduling Framework.
24+
- **Plugin** - Implementation of framework-defined interface(s) to add or extend logic across the framework.
25+
26+
## Proposal
27+
28+
The Scheduling System draws inspiration from the kube-schedulers pluggable system, though there are distinct differences in goals/usage.
29+
30+
The Scheduling System can loosely be defined into 3 sections:
31+
- A *framework* to implement the system
32+
- The *interfaces* that a consumer can use to extend the system
33+
- A *configuration API* to define the Scheduler, Profile(s), & the plugins used within those profiles
34+
35+
A sketch of the System, with extension points is here:
36+
<img src="./images/scheduler_subsystem.svg" alt="Scheduling Algorithm" width="1000" />
37+
38+
Describing the interface extension points & flow is the simplest way to convey the intent of what the framework should enable:
39+
40+
### PreSchedule
41+
42+
PreSchedule is the entry point into the scheduling cycle (called by the framework). PreSchedule, selects profiles conditionally based on:
43+
44+
- Request data
45+
- Results
46+
- Cycle State
47+
48+
PreSchedule will be continuously called so long as profiles are returned; multiple profiles may be returned in a single call. Only a single PreSchedule function may be defined per scheduler.
49+
50+
### Profile Cycle
51+
52+
The profile cycle consists of 3 defined functions `Filter`, `Score`, & `Pick`
53+
54+
*Profile Constraints*
55+
- A profile can have any number of `Filter` plugins registered (including zero)
56+
- A profile can have any number of `Score` plugins registered (including zero)
57+
- A profile MUST have exactly one `Pick` plugin registered
58+
59+
60+
#### Filter
61+
Filter runs before any scoring, and remove endpoints that are not fit for selection. The framework will return an error to the client if the endpoints are filtered to zero.
62+
63+
#### Score
64+
Score applies a score to each remaining endpoint provided. Scorers SHOULD keep their score values in a normalized range: [0-1]. Any weighting should be added at the SchedulingProfile configuration level.
65+
66+
#### Pick
67+
Picker selects the endpoint(s) from the provided list of scored endpoints. Picker MUST return, one endpoint at minimum.
68+
69+
70+
### PostSchedule
71+
PostSchedule recieves the output of the result(s) of the scheduling cycle(s) and makes sense of the data to be consumed by the calling system.
72+
73+
### PostResponse
74+
PostResponse is a special case extension that can optionally be implemented by a plugin that needs to augment its state based on response or request data. This should only be implemented for plugins that need to update state outside of the scheduling cycle. PostResponse is ran at the time of processing a response.
75+
76+
## ConfigurationAPI
77+
TODO
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
#names are egregiously long, but attempting to descibe custom logic within a name
2+
profileSelection: disagg-token-length
3+
schedulingResult: log-shadowbox-label-pd-result
4+
profiles:
5+
prefill:
6+
preschedule:
7+
- decode-prefix-cache-check
8+
filter:
9+
- is-prefill
10+
- has-required-accelerator
11+
score:
12+
- prefix-cache: 3
13+
- latency-scorer: 2
14+
selection:
15+
- best-score
16+
postschedule:
17+
- log-full-scores
18+
decode:
19+
filter:
20+
- is-decode
21+
score:
22+
- prefix-cache: 3
23+
- kv-cache-util: 5
24+
selection:
25+
- random-top-3
26+
shadowbox-decode:
27+
filter:
28+
- is-decode
29+
- is-tpu
30+
score:
31+
- prefix-cache-v2: 4
32+
- kv-cache-util: 1
33+
selection:
34+
- random-top-3

docs/proposals/0845-scheduler-architecture-proposal/images/scheduler_subsystem.svg

Lines changed: 1 addition & 0 deletions
Loading
Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
/*
2+
Copyright 2025 The Kubernetes Authors.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing, software
11+
distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
*/
16+
17+
package framework
18+
19+
import (
20+
"context"
21+
22+
scheduling "sigs.k8s.io/gateway-api-inference-extension/pkg/epp/scheduling/types"
23+
)
24+
25+
// READER NOTE: Currently CycleState is assumed to have appropriate request data rather that making a new object.
26+
27+
// Plugin is the parent type for all the scheduling framework plugins.
28+
type Plugin interface {
29+
Name() string
30+
}
31+
32+
type Endpoint struct {
33+
State EndpointState
34+
Score float64
35+
}
36+
37+
type EndpointState struct {
38+
// storage is per Scheduling Cycle, and so has no thread-safe concerns.
39+
storage map[string]any
40+
}
41+
42+
type SchedulingResult struct {
43+
results map[string][]Endpoint
44+
}
45+
46+
// Scheduler is the implementation of a... scheduler.
47+
// The scheduler object is created at startup using the provided configuration.
48+
type Scheduler interface {
49+
// PreSchedule selects scheduling profiles through the implemented
50+
// logic, and returns:
51+
// - profiles - A subset of the registered scheduling profiles to be ran
52+
PreSchedule(request map[string]any, data scheduling.CycleState, results map[string][]Endpoint) map[string]SchedulingProfile
53+
54+
// PostSchedule recieves the output of the result(s) of the scheduling cycle(s)
55+
// and makes sense of the data to be consumed by the calling system.
56+
// For example: suppose you have 2 profiles ShadowBoxing Profile & Production Profile.
57+
// PostSchedule would know to simply log the result of ShadowBoxing
58+
// profile, and do nothing else with it.
59+
PostSchedule(profileResults map[string][]Endpoint) SchedulingResult
60+
}
61+
62+
// SchedulingProfile is used to describe a profile that will
63+
// run for a given scheduling cycle.
64+
type SchedulingProfile struct {
65+
// Name of the profile.
66+
Name string
67+
// Filters lists all Filter plugins associated with this Profile. Filters
68+
// are optional.
69+
Filters []Filter
70+
// Scorers lists all Score plugins associated with this Profile. Scorers
71+
// are optional.
72+
Scorers map[Scorer]int
73+
// Picker returns the function that picks the endpoint(s). Picker is required.
74+
Picker Picker
75+
}
76+
77+
// Filter runs before any scoring, and remove endpoints that are not fit for
78+
// selection. The framework will return an error to the client if the endpoints
79+
// are filtered to zero.
80+
type Filter interface {
81+
Plugin
82+
Filter(ctx context.Context, state scheduling.CycleState, endpoints []Endpoint) []Endpoint
83+
}
84+
85+
// Scorer applies a score to each remaining endpoint provided. Scorers SHOULD
86+
// keep their score values in a normalized range: [0-1]. Any weighting should
87+
// be added at the SchedulingProfile configuration level.
88+
type Scorer interface {
89+
Plugin
90+
Score(ctx context.Context, state scheduling.CycleState, endpoints []Endpoint) []Endpoint
91+
}
92+
93+
// Picker selects the endpoint(s) from the provided list of scored endpoints.
94+
// Picker MUST return, one endpoint at minimum.
95+
type Picker interface {
96+
Plugin
97+
Pick(ctx context.Context, state scheduling.CycleState, endpoints []Endpoint) []Endpoint
98+
}
99+
100+
type PostResponse interface {
101+
Plugin
102+
PostResponse(ctx context.Context, request map[string]any, response map[string]any)
103+
}

0 commit comments

Comments
 (0)