Skip to content

passing headers to scheduler plugins #775

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

nirrozenbaum
Copy link
Contributor

This PR makes request headers available in SchedulingContext (for use of scheduler plugins).
additionally, it fixes some minor typos and cosmetics.

@k8s-ci-robot k8s-ci-robot requested a review from Jeffwan May 2, 2025 13:38
@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 2, 2025
@k8s-ci-robot k8s-ci-robot requested a review from robscott May 2, 2025 13:38
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label May 2, 2025
Copy link

netlify bot commented May 2, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 7b8a83e
🔍 Latest deploy log https://app.netlify.com/sites/gateway-api-inference-extension/deploys/6815183a3a3d2d0008f9d0da
😎 Deploy Preview https://deploy-preview-775--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@@ -29,16 +29,18 @@ import (
// LLMRequest is a structured representation of the fields we parse out of the LLMRequest body.
type LLMRequest struct {
Model string
// Target models is a map of target model name to weight.
TargetModels map[string]int
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed TargetModels from LLMRequest, which is never set and is always empty.
LLM request holds the fields that are passed to scheduler. the scheduler doesn't care about the target models, but only about the ResolvedTargetModel (traffic splitting is done in request.go. when LLMRequest is passed to scheduler the target model has been resolved already).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not this PR, but I was wondering if we should even keep the LLMRequest object. I worry that reqCtx is this monolithic super-object. But also, its been very useful to have, and its not really expensive to pass around since you'd just be passing a pointer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in terms of best practices, it would probably be best to expose to each layer only the things it needs.
one mono super-object may be very convenient, but it opens the door for updating fields that the layer shouldn't.
we might be able to keep the mono object but work with layer interfaces that expose to each layer only the fields it needs.
anyway, as you said, this is not in scope of this PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I was noodling on how best to deal with this since I commented, I think I have some ideas. WIP tho

@nirrozenbaum
Copy link
Contributor Author

cc @kfswain

ResolvedTargetModel string
Critical bool
// Critical is a boolean that specifies if a request is critical or not.
Critical bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make this an int since we already have 3 criticality levels, and likely will be extended to have more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In current code scheduler changes its algorithm based on the question of whether the request is critical or not.
I agree that this field should be changed, but this requires adaptation in more places other than this field.
should be done in a separate PR dedicated for this point (not in this PR scope).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++

@kfswain
Copy link
Collaborator

kfswain commented May 2, 2025

LGTM for the most part, the criticality change is technically out of scope for this PR, but would love to see it changed. Up to the author, will hold to let them decide.
/lgtm
/approve
/hold

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 2, 2025
@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 2, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kfswain, nirrozenbaum

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label May 2, 2025
Signed-off-by: Nir Rozenbaum <[email protected]>
@nirrozenbaum nirrozenbaum force-pushed the headers-for-scheduler branch from b2b3244 to 7b8a83e Compare May 2, 2025 19:08
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 2, 2025
@kfswain
Copy link
Collaborator

kfswain commented May 2, 2025

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 2, 2025
@nirrozenbaum
Copy link
Contributor Author

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 2, 2025
@k8s-ci-robot k8s-ci-robot merged commit 6a2a3ec into kubernetes-sigs:main May 2, 2025
8 checks passed
@nirrozenbaum nirrozenbaum deleted the headers-for-scheduler branch May 2, 2025 19:35
rlakhtakia pushed a commit to rlakhtakia/gateway-api-inference-extension that referenced this pull request May 13, 2025
* passing headers to scheduler plugins

Signed-off-by: Nir Rozenbaum <[email protected]>

* addressed code review comments

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>
kaushikmitr pushed a commit to kaushikmitr/llm-instance-gateway that referenced this pull request May 15, 2025
* passing headers to scheduler plugins

Signed-off-by: Nir Rozenbaum <[email protected]>

* addressed code review comments

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants