-
Notifications
You must be signed in to change notification settings - Fork 6
[UPSTREAM-SYNC] Feat: Add invocation of Post response plugins #132
Conversation
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small typo and a question. Other than that:
/lgtm
/approve
result, err = s.scheduler.OnResponse(ctx, llmReq, reqCtx.TargetPod) | ||
if err != nil { | ||
logger.V(logutil.DEFAULT).Error(err, "Error handling response") | ||
reqCtx.ResponseStatusCode = errutil.ModelServerError |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q: will returning an error not send the response back to the user?
If this is an internal logic failure and we've already compute the LLM response, might be wasteful.
Co-authored-by: Etai Lev Ran <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
Signed-off-by: Shmuel Kallner <[email protected]>
for _, pod := range pods { | ||
if pod.GetPod().NamespacedName.String() == targetPodName { | ||
targetPod = pod | ||
break | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if a pod was selected scheduler and crashed before this call, what happens here?
targetPod remains nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct.
The whole question of failure handling is not sufficiently specified in the system and needs to be addressed. There are many failure scenarios (e.g., EPP crashes during different points in processing, envoy crashes, ...) across and inside components.
Since we're going to archive this fork shortly I think we can close this. Any objections? |
This PR adds the invocation of any configured PostResponse plugins during the processing of responses from the chosen inference server. It also enables the PostResponse plugin to add/modify headers that are ultimately sent to the client.