EPP deployment should specify resource requirements #691

liu-cong · 2025-04-14T22:22:46Z

What would you like to be added:

The example deployment https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/config/manifests/inferencepool-resources.yaml should specify resource requests and limits, and provide a guidance on how to configure resource requirements correctly. This will involve some experimenting/benchmarking, and give some guidance on the resource vs. load.

Note the ongoing development of queuing and prefix cache will significantly change the memory footprint.

Why is this needed:

kfswain · 2025-04-21T22:33:46Z

Mem footprint will definitely change, and be incredibly biased to the features enabled or added to the EPP. Instead of setting resource limits, we should include a guide as to how we came to those values

kfswain added the triage/needs-information Indicates an issue needs more information in order to work on it. label Apr 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EPP deployment should specify resource requirements #691

EPP deployment should specify resource requirements #691

liu-cong commented Apr 14, 2025

kfswain commented Apr 21, 2025

EPP deployment should specify resource requirements #691

EPP deployment should specify resource requirements #691

Comments

liu-cong commented Apr 14, 2025

kfswain commented Apr 21, 2025