-
Notifications
You must be signed in to change notification settings - Fork 92
passing headers to scheduler plugins #775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
passing headers to scheduler plugins #775
Conversation
Signed-off-by: Nir Rozenbaum <[email protected]>
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
@@ -29,16 +29,18 @@ import ( | |||
// LLMRequest is a structured representation of the fields we parse out of the LLMRequest body. | |||
type LLMRequest struct { | |||
Model string | |||
// Target models is a map of target model name to weight. | |||
TargetModels map[string]int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed TargetModels from LLMRequest, which is never set and is always empty.
LLM request holds the fields that are passed to scheduler. the scheduler doesn't care about the target models, but only about the ResolvedTargetModel (traffic splitting is done in request.go
. when LLMRequest is passed to scheduler the target model has been resolved already).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not this PR, but I was wondering if we should even keep the LLMRequest object. I worry that reqCtx is this monolithic super-object. But also, its been very useful to have, and its not really expensive to pass around since you'd just be passing a pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in terms of best practices, it would probably be best to expose to each layer only the things it needs.
one mono super-object may be very convenient, but it opens the door for updating fields that the layer shouldn't.
we might be able to keep the mono object but work with layer interfaces that expose to each layer only the fields it needs.
anyway, as you said, this is not in scope of this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I was noodling on how best to deal with this since I commented, I think I have some ideas. WIP tho
cc @kfswain |
ResolvedTargetModel string | ||
Critical bool | ||
// Critical is a boolean that specifies if a request is critical or not. | ||
Critical bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make this an int since we already have 3 criticality levels, and likely will be extended to have more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In current code scheduler changes its algorithm based on the question of whether the request is critical or not.
I agree that this field should be changed, but this requires adaptation in more places other than this field.
should be done in a separate PR dedicated for this point (not in this PR scope).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
++
LGTM for the most part, the criticality change is technically out of scope for this PR, but would love to see it changed. Up to the author, will hold to let them decide. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kfswain, nirrozenbaum The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Nir Rozenbaum <[email protected]>
b2b3244
to
7b8a83e
Compare
/lgtm |
/unhold |
* passing headers to scheduler plugins Signed-off-by: Nir Rozenbaum <[email protected]> * addressed code review comments Signed-off-by: Nir Rozenbaum <[email protected]> --------- Signed-off-by: Nir Rozenbaum <[email protected]>
* passing headers to scheduler plugins Signed-off-by: Nir Rozenbaum <[email protected]> * addressed code review comments Signed-off-by: Nir Rozenbaum <[email protected]> --------- Signed-off-by: Nir Rozenbaum <[email protected]>
This PR makes request headers available in SchedulingContext (for use of scheduler plugins).
additionally, it fixes some minor typos and cosmetics.