-
Notifications
You must be signed in to change notification settings - Fork 74
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
EPP Multi-tenancy #724
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The 3 is currently correct |
Follow up question. Re: epp, is the intent to have possibly a single |
This has come up quite a bit, I think the jury is still out. Personally, I'm concerned that multi-tenancy could turn out to be an anti-pattern as it creates a single point of failure, and applies pressure to any scale issues that may occur. For context: we intend to support more inference-routing specific features such as: Prefix Aware Routing, which will require quite a bit of memory space on the EPP, additionally we expect to have callouts for things like RAG, or tokenization of the input (just for examples). This will require quite a bit more computational & memory overhead. I think multi-tenancy would hit scale limits faster |
I added it to our agenda for our weekly Th meeting as this has come up enough recently, if you have time to join and have opinions, would love to hear them there. meeting info here: https://github.com/kubernetes-sigs/gateway-api-inference-extension?tab=readme-ov-file#contributing |
It seems to me that inference pools inside the same namespace should have the option of referring to the same epp. This enables isolation across namespaces, and also reuse of epp within a single namespace. |
We discussed this in the OSS meeting today some, when the recording is available i can link it, do you have a use case for reusing the EPP within a namespace? Is it simpler ops? |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
I have a question re: guidance for implementers.
Is the intent behind the current
inference model
andinference pool
design the following?I'm not sure about upcoming enhancements to the CRDs, but I am trying to understand if above is the manner in which the current CRDs are intended to be used.
Thanks in advance for your clarifications!
The text was updated successfully, but these errors were encountered: