-
Notifications
You must be signed in to change notification settings - Fork 102
passing headers to scheduler plugins #775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,17 +28,21 @@ import ( | |
|
||
// LLMRequest is a structured representation of the fields we parse out of the LLMRequest body. | ||
type LLMRequest struct { | ||
// Model is the name of the model that the user specified in the request body. | ||
Model string | ||
// Target models is a map of target model name to weight. | ||
TargetModels map[string]int | ||
Prompt string | ||
// Resolved target model is the final target model after traffic split. | ||
// ResolvedTargetModel is the final target model after traffic split. | ||
nirrozenbaum marked this conversation as resolved.
Show resolved
Hide resolved
|
||
ResolvedTargetModel string | ||
Critical bool | ||
// Critical is a boolean that specifies if a request is critical or not. | ||
Critical bool | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should make this an int since we already have 3 criticality levels, and likely will be extended to have more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In current code scheduler changes its algorithm based on the question of whether the request is critical or not. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ++ |
||
// Prompt is the prompt that was sent in the request body. | ||
Prompt string | ||
// Headers is a map of the request headers. | ||
Headers map[string]string | ||
} | ||
|
||
func (r *LLMRequest) String() string { | ||
return fmt.Sprintf("Model: %s, TargetModels: %v, ResolvedTargetModel: %s, Critical: %t, PromptLength: %v", r.Model, r.TargetModels, r.ResolvedTargetModel, r.Critical, len(r.Prompt)) | ||
return fmt.Sprintf("Model: %s, ResolvedTargetModel: %s, Critical: %t, PromptLength: %d, Headers: %v", | ||
r.Model, r.ResolvedTargetModel, r.Critical, len(r.Prompt), r.Headers) | ||
} | ||
|
||
type Pod interface { | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed TargetModels from LLMRequest, which is never set and is always empty.
LLM request holds the fields that are passed to scheduler. the scheduler doesn't care about the target models, but only about the ResolvedTargetModel (traffic splitting is done in
request.go
. when LLMRequest is passed to scheduler the target model has been resolved already).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not this PR, but I was wondering if we should even keep the LLMRequest object. I worry that reqCtx is this monolithic super-object. But also, its been very useful to have, and its not really expensive to pass around since you'd just be passing a pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in terms of best practices, it would probably be best to expose to each layer only the things it needs.
one mono super-object may be very convenient, but it opens the door for updating fields that the layer shouldn't.
we might be able to keep the mono object but work with layer interfaces that expose to each layer only the fields it needs.
anyway, as you said, this is not in scope of this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, I was noodling on how best to deal with this since I commented, I think I have some ideas. WIP tho