Consider backend augmentation vs new backend type #725
Replies: 4 comments 3 replies
-
Thanks! This makes a lot of sense. I also know you had a few ways in mind to achieve that. One is policy attachment, were there other ways? Mind listing a few ideas to kickstart the discussion? |
Beta Was this translation helpful? Give feedback.
-
@howardjohn can you respond to the above comment? |
Beta Was this translation helpful? Give feedback.
-
Thanks for raising this point @howardjohn! I know we've discussed this a lot, but I failed to actually respond on GitHub. The decision to use a custom backend type here instead of Service was inspired by several reasons:
It is worth digging into alternatives though, because there are distinct advantages to each of them. 1) Policy Attachment Advantages:
Disadvantages:
2) Backend Filter Advantages:
Disadvantages:
Additional Thoughts on Implementation Complexity |
Beta Was this translation helpful? Give feedback.
-
I think these criteria are a decent starting point but I think in the context of the specific solution (picked backend) some refinement is needed. Some observations:
Taking your criteria above in this context: Custom Backend:
Policy Decoration:
In short I don't think the API is currently justifying its weight and that an EPP-LB policy type attached to Service would work and be generically useful There are alternates where some of the criteria hold more weight but it's not clear they are desirable outcomes:. Specifically if EPP was not an LB but defined the backend endpoint set entirely virtually. This would imply no fail-open capability however. There are a bunch of other non-API related considertations that should weigh in favor of using Service and policy attachment. Chief among these is all the visualization and traffic analysis ecosystem that has built up around it. I completely understand the desire to have some LLM branding in the APIs so people know its 'for' that use-case. I' d be fine with calling the EPP binding InferenceLBPolicy or somesuch even though its totally generic behavior. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
The InferencePool defines a new backend type. Ignoring a few things, this essentially defines a Service but with a different load balancing selection.
Gateway API already provides two mechanisms to augment the behavior of a backend: a
filter
on the backendRef, or a Policy attachment.I would argue one of these methods may be more appropriate than defining a new backend type.
A major problem with using the "new backend type" pattern for something like this, IMO is the lack of composability.
To make things simpler, let me change from discussing InferencePool to a strawman backendRef: a RoundRobinBackend type, which controls how to load balance over a set of pods. For example:
Now a user has a use case to add TLS to the backend as well. One approach they could take is to build a new backend type:
TLSOriginationBackend
. But this has a problem:Pod|arbitrary backend
. This quick gets quite obnoxious. Not only could I end up withRoundRobinBackend<TLSOriginationBackend<....<Pods>>
, implementations need to know about all of the types.This is not hypothetical either: Envoy Gateway has an AIServiceBackend type, and for the implementation of InferencePool they are considering AiServiceBackend pointing to an InferencePool. So you have
AIService<InferencePool<Pod>>
(ref) (note: I am not involved in Envoy Gateway, so merely a bystander reading the issue).Another problem with this approach is on implementations. Because we have made InferencePool essentially "Service lite + some other stuff", each controller needs to become a "Service controller lite". Typically, this job is delegated to the EndpointSlice controller in Kubernetes, and all existing gateway API controllers read EndpointSlice to determine the endpoints to include in Service references.
This leaves a few options:
I would propose that InferencePool should instead augment an existing backend type (Service). This allows better composition, simplification for controllers, and is a bit more standard pattern seen in the ecosystem. Additionally, users will probably have a long tail of feature requests for new functionality in InferencePool that already exists in Service (like named target ports, publishNotReadyAddresses, etc) which can be avoided.
cc @LiorLieberman @louiscryan @robscott @danehans
Beta Was this translation helpful? Give feedback.
All reactions