You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Mem footprint will definitely change, and be incredibly biased to the features enabled or added to the EPP. Instead of setting resource limits, we should include a guide as to how we came to those values
What would you like to be added:
The example deployment https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/config/manifests/inferencepool-resources.yaml should specify resource requests and limits, and provide a guidance on how to configure resource requirements correctly. This will involve some experimenting/benchmarking, and give some guidance on the resource vs. load.
Note the ongoing development of queuing and prefix cache will significantly change the memory footprint.
Why is this needed:
The text was updated successfully, but these errors were encountered: