Skip to content

EPP deployment should specify resource requirements #691

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
liu-cong opened this issue Apr 14, 2025 · 1 comment
Open

EPP deployment should specify resource requirements #691

liu-cong opened this issue Apr 14, 2025 · 1 comment
Labels
triage/needs-information Indicates an issue needs more information in order to work on it.

Comments

@liu-cong
Copy link
Contributor

What would you like to be added:

The example deployment https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/config/manifests/inferencepool-resources.yaml should specify resource requests and limits, and provide a guidance on how to configure resource requirements correctly. This will involve some experimenting/benchmarking, and give some guidance on the resource vs. load.

Note the ongoing development of queuing and prefix cache will significantly change the memory footprint.

Why is this needed:

@kfswain
Copy link
Collaborator

kfswain commented Apr 21, 2025

Mem footprint will definitely change, and be incredibly biased to the features enabled or added to the EPP. Instead of setting resource limits, we should include a guide as to how we came to those values

@kfswain kfswain added the triage/needs-information Indicates an issue needs more information in order to work on it. label Apr 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triage/needs-information Indicates an issue needs more information in order to work on it.
Projects
None yet
Development

No branches or pull requests

2 participants