-
Notifications
You must be signed in to change notification settings - Fork 87
[WIP] proposal - add readypods field to InferencePoolStatus #714
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Nir Rozenbaum <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: nirrozenbaum The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for gateway-api-inference-extension ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
//ReadyPods is the number of pods that are associated with the InferencePool and are ready | ||
// to serve inference requests. A pod is considered ready if its Ready condition is set to | ||
// "True" and it is not marked for deletion. | ||
ReadyPods int `json:"readyPods,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the goal for this field? Historically in Kubernetes we've tried to ensure a separation of concerns here. Ideally the portion of a Gateway implementation that handles endpoint readiness should be something that can be disconnected from higher level routing config.
More broadly, I'd be worried that trying to keep track of this in status would dramatically increase the number of updates to InferencePool status and the overall load on the controller. Instead of needing to reconcile when an InferencePool changed, it would need to reconcile anytime the readiness or number of connected endpoints changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@robscott that's a fair point. GIE response time to inference requests is very sensitive and this change may increase the number of reconcile events dramatically as you explained. my thinking was to allow an easy and native way for visibility of the high level state of the pool. but when considering the tradeoff, it's not worth it. visibility can be seen also in prometheus. closing the issue. |
This PR suggests adding to InferencePool status field
ReadyPods
to reflect the number of pods that are associated with the pool and are ready to serve inference requests.A pod is considered ready if its Ready condition is set to "True" and it is not marked for deletion.