-
Notifications
You must be signed in to change notification settings - Fork 8
[UPSTREAM-SYNC] Feat: Add invocation of Post response plugins #132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: upstream-sync
Are you sure you want to change the base?
Conversation
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small typo and a question. Other than that:
/lgtm
/approve
result, err = s.scheduler.OnResponse(ctx, llmReq, reqCtx.TargetPod) | ||
if err != nil { | ||
logger.V(logutil.DEFAULT).Error(err, "Error handling response") | ||
reqCtx.ResponseStatusCode = errutil.ModelServerError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q: will returning an error not send the response back to the user?
If this is an internal logic failure and we've already compute the LLM response, might be wasteful.
Co-authored-by: Etai Lev Ran <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
for _, pod := range pods { | ||
if pod.GetPod().NamespacedName.String() == targetPodName { | ||
targetPod = pod | ||
break | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a pod was selected scheduler and crashed before this call, what happens here?
targetPod remains nil?
This PR adds the invocation of any configured PostResponse plugins during the processing of responses from the chosen inference server. It also enables the PostResponse plugin to add/modify headers that are ultimately sent to the client.