From fb159a59afd1bbe1bf9d0001fff89a368fa5afd6 Mon Sep 17 00:00:00 2001 From: ahg-g Date: Mon, 3 Mar 2025 23:09:08 +0000 Subject: [PATCH 1/9] Amend the endpoint picker protocol to support fallbacks and subsetting --- .../004-endpoint-picker-protocol/README.md | 32 ++++++++++++++++--- 1 file changed, 27 insertions(+), 5 deletions(-) diff --git a/docs/proposals/004-endpoint-picker-protocol/README.md b/docs/proposals/004-endpoint-picker-protocol/README.md index 3657a10e8..bcefc481e 100644 --- a/docs/proposals/004-endpoint-picker-protocol/README.md +++ b/docs/proposals/004-endpoint-picker-protocol/README.md @@ -9,17 +9,38 @@ This doc defines the protocol between the EPP and the proxy (e.g, Envoy). The EPP MUST implement the Envoy [external processing service](https://www.envoyproxy.io/docs/envoy/latest/api-v3/service/ext_proc/v3/external_processor) protocol. +## Proxy Request +For each HTTP request, the proxy CAN communicate the subset of endpoints the EPP MUST pick from by setting an unstructured entry in the [filter metadata](https://github.com/envoyproxy/go-control-plane/blob/63a55395d7a39a8d43dcc7acc3d05e4cae7eb7a2/envoy/config/core/v3/base.pb.go#L819) field of the ext-proc request. The metadata entry for the subset list MUST be wrapped with an outer key (which represents the metadata namespace) with a default of `envoy.lb.subset_hint`. + +```go +filterMetadata: { + "envoy.lb.subset_hint" { + "x-gateway-destination-endpoint-subset-hint": [, , ...] + } +} +``` + +## EPP Response For each HTTP request, the EPP MUST communicate to the proxy the picked model server endpoint via: 1. Setting the `x-gateway-destination-endpoint` HTTP header to the selected endpoint in format. -2. Set an unstructured entry in the [dynamic_metadata](https://github.com/envoyproxy/go-control-plane/blob/c19bf63a811c90bf9e02f8e0dc1dcef94931ebb4/envoy/service/ext_proc/v3/external_processor.pb.go#L320) field of the ext-proc response. The metadata entry for the picked endpoint MUST be wrapped with an outer key (which represents the metadata namespace) with a default of `envoy.lb`. +2. Set an unstructured entry in the [dynamic_metadata](https://github.com/envoyproxy/go-control-plane/blob/c19bf63a811c90bf9e02f8e0dc1dcef94931ebb4/envoy/service/ext_proc/v3/external_processor.pb.go#L320) field of the ext-proc response. The metadata entry for the picked endpoints MUST be wrapped with an outer key (which represents the metadata namespace) with a default of `envoy.lb`. -The final metadata necessary would look like: +The pirmary endpoint MUST be set using the key `x-gateway-destination-endpoint` as follows: ```go dynamicMetadata: { "envoy.lb": { - "x-gateway-destination-endpoint": " + "x-gateway-destination-endpoint": + } +} +``` + +Fallback endpoints MUST be set using the key `x-gateway-destination-endpoint-fallbacks` as a list as follows: +```go +dynamicMetadata: { + "envoy.lb" { + "x-gateway-destination-endpoint-fallbacks": [, , ...] } } ``` @@ -27,9 +48,10 @@ dynamicMetadata: { Note: - If the EPP did not communicate the server endpoint via these two methods, it MUST return an error. - The EPP MUST not set two different values in the header and the inner response metadata value. +- Setting different value leads to unpredictable behavior because proxies aren't guaranteed to support both paths, and so this protocol does not define what takes precedence. -## Why envoy.lb namespace as a default? +### Why envoy.lb namespace as a default? The `envoy.lb` namesapce is a predefined namespace used for subsetting. One common way to use the selected endpoint returned from the server, is [envoy subsets](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/subsets) where host metadata for subset load balancing must be placed under `envoy.lb`. -Setting different value leads to unpredictable behavior because proxies aren't guaranteed to support both paths, and so this protocol does not define what takes precedence. + From c90c102da8b3c82ff12543f3899c619035b26f5e Mon Sep 17 00:00:00 2001 From: ahg-g Date: Tue, 4 Mar 2025 06:08:28 +0000 Subject: [PATCH 2/9] Addressed comments --- .../004-endpoint-picker-protocol/README.md | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/docs/proposals/004-endpoint-picker-protocol/README.md b/docs/proposals/004-endpoint-picker-protocol/README.md index bcefc481e..cb16df366 100644 --- a/docs/proposals/004-endpoint-picker-protocol/README.md +++ b/docs/proposals/004-endpoint-picker-protocol/README.md @@ -9,7 +9,7 @@ This doc defines the protocol between the EPP and the proxy (e.g, Envoy). The EPP MUST implement the Envoy [external processing service](https://www.envoyproxy.io/docs/envoy/latest/api-v3/service/ext_proc/v3/external_processor) protocol. -## Proxy Request +## Endpoint Subset For each HTTP request, the proxy CAN communicate the subset of endpoints the EPP MUST pick from by setting an unstructured entry in the [filter metadata](https://github.com/envoyproxy/go-control-plane/blob/63a55395d7a39a8d43dcc7acc3d05e4cae7eb7a2/envoy/config/core/v3/base.pb.go#L819) field of the ext-proc request. The metadata entry for the subset list MUST be wrapped with an outer key (which represents the metadata namespace) with a default of `envoy.lb.subset_hint`. ```go @@ -20,14 +20,18 @@ filterMetadata: { } ``` -## EPP Response +If the key `x-gateway-destination-endpoint-subset-hint` is set, the EPP MUST only select endpoints from the specified list. If none of the endpoints in the list is eligible or the list is empty, then the EPP MUST return a 429 status code. + +If the key `x-gateway-destination-endpoint-subset-hint` is not set, then the EPP MUST select from the set defined by the `InferencePool` selector. + +## Destination Endpoint For each HTTP request, the EPP MUST communicate to the proxy the picked model server endpoint via: 1. Setting the `x-gateway-destination-endpoint` HTTP header to the selected endpoint in format. 2. Set an unstructured entry in the [dynamic_metadata](https://github.com/envoyproxy/go-control-plane/blob/c19bf63a811c90bf9e02f8e0dc1dcef94931ebb4/envoy/service/ext_proc/v3/external_processor.pb.go#L320) field of the ext-proc response. The metadata entry for the picked endpoints MUST be wrapped with an outer key (which represents the metadata namespace) with a default of `envoy.lb`. -The pirmary endpoint MUST be set using the key `x-gateway-destination-endpoint` as follows: +The primary endpoint MUST be set using the key `x-gateway-destination-endpoint` as follows: ```go dynamicMetadata: { "envoy.lb": { @@ -45,13 +49,13 @@ dynamicMetadata: { } ``` -Note: +Constraints: - If the EPP did not communicate the server endpoint via these two methods, it MUST return an error. - The EPP MUST not set two different values in the header and the inner response metadata value. - Setting different value leads to unpredictable behavior because proxies aren't guaranteed to support both paths, and so this protocol does not define what takes precedence. -### Why envoy.lb namespace as a default? +Why envoy.lb namespace as a default? The `envoy.lb` namesapce is a predefined namespace used for subsetting. One common way to use the selected endpoint returned from the server, is [envoy subsets](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/subsets) where host metadata for subset load balancing must be placed under `envoy.lb`. - - +## Matching An InferenceModel +The model name of a request MUST match the `Sepc.ModelName` parameter of one of the `InferenceModels` referencing the `InferencePool` managed by the EPP. Otherwise, the EPP MUST return a 404 status code. From 137f2a87699fb73656ecb5de25ec3791615c2fe4 Mon Sep 17 00:00:00 2001 From: ahg-g Date: Wed, 5 Mar 2025 17:45:44 +0000 Subject: [PATCH 3/9] specify the behavior when the epp doesn't respect the subset --- docs/proposals/004-endpoint-picker-protocol/README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/proposals/004-endpoint-picker-protocol/README.md b/docs/proposals/004-endpoint-picker-protocol/README.md index cb16df366..d5c719144 100644 --- a/docs/proposals/004-endpoint-picker-protocol/README.md +++ b/docs/proposals/004-endpoint-picker-protocol/README.md @@ -15,14 +15,14 @@ For each HTTP request, the proxy CAN communicate the subset of endpoints the EPP ```go filterMetadata: { "envoy.lb.subset_hint" { - "x-gateway-destination-endpoint-subset-hint": [, , ...] + "x-gateway-destination-endpoint-subset": [, , ...] } } ``` -If the key `x-gateway-destination-endpoint-subset-hint` is set, the EPP MUST only select endpoints from the specified list. If none of the endpoints in the list is eligible or the list is empty, then the EPP MUST return a 429 status code. +If the key `x-gateway-destination-endpoint-subset` is set, the EPP MUST only select endpoints from the specified list. If none of the endpoints in the list is eligible or the list is empty, then the EPP MUST return a 429 status code. If the EPP does not select from the list, then this leads to unpredictable behavior. -If the key `x-gateway-destination-endpoint-subset-hint` is not set, then the EPP MUST select from the set defined by the `InferencePool` selector. +If the key `x-gateway-destination-endpoint-subset` is not set, then the EPP MUST select from the set defined by the `InferencePool` selector. ## Destination Endpoint For each HTTP request, the EPP MUST communicate to the proxy the picked model server endpoint via: From 3c457baf4544bade6b2a888bd31d4cd848cf01bb Mon Sep 17 00:00:00 2001 From: ahg-g Date: Thu, 6 Mar 2025 01:42:48 +0000 Subject: [PATCH 4/9] addressing more comments --- .../004-endpoint-picker-protocol/README.md | 20 ++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/docs/proposals/004-endpoint-picker-protocol/README.md b/docs/proposals/004-endpoint-picker-protocol/README.md index d5c719144..c29d5cccc 100644 --- a/docs/proposals/004-endpoint-picker-protocol/README.md +++ b/docs/proposals/004-endpoint-picker-protocol/README.md @@ -29,7 +29,7 @@ For each HTTP request, the EPP MUST communicate to the proxy the picked model se 1. Setting the `x-gateway-destination-endpoint` HTTP header to the selected endpoint in format. -2. Set an unstructured entry in the [dynamic_metadata](https://github.com/envoyproxy/go-control-plane/blob/c19bf63a811c90bf9e02f8e0dc1dcef94931ebb4/envoy/service/ext_proc/v3/external_processor.pb.go#L320) field of the ext-proc response. The metadata entry for the picked endpoints MUST be wrapped with an outer key (which represents the metadata namespace) with a default of `envoy.lb`. +2. Set an unstructured entry in the [dynamic_metadata](https://github.com/envoyproxy/go-control-plane/blob/c19bf63a811c90bf9e02f8e0dc1dcef94931ebb4/envoy/service/ext_proc/v3/external_processor.pb.go#L320) field of the ext-proc response. The metadata entry for the picked endpoint MUST be wrapped with an outer key (which represents the metadata namespace) with a default of `envoy.lb`. The primary endpoint MUST be set using the key `x-gateway-destination-endpoint` as follows: ```go @@ -40,21 +40,23 @@ dynamicMetadata: { } ``` -Fallback endpoints MUST be set using the key `x-gateway-destination-endpoint-fallbacks` as a list as follows: +Constraints: +- If the EPP did not communicate the server endpoint via these two methods, it MUST return an error. +- The EPP MUST not set two different values in the header and the inner response metadata value. +- Setting different value leads to unpredictable behavior because proxies aren't guaranteed to support both paths, and so this protocol does not define what takes precedence. + +### Destination endpoint fallback +A single fallback endpoint CAN be set using the key `x-gateway-destination-endpoint-fallback` in the same metadata namespace as one used for `x-gateway-destination-endpoint` as follows: + ```go dynamicMetadata: { "envoy.lb" { - "x-gateway-destination-endpoint-fallbacks": [, , ...] + "x-gateway-destination-endpoint-fallback": } } ``` -Constraints: -- If the EPP did not communicate the server endpoint via these two methods, it MUST return an error. -- The EPP MUST not set two different values in the header and the inner response metadata value. -- Setting different value leads to unpredictable behavior because proxies aren't guaranteed to support both paths, and so this protocol does not define what takes precedence. - -Why envoy.lb namespace as a default? +### Why envoy.lb namespace as a default? The `envoy.lb` namesapce is a predefined namespace used for subsetting. One common way to use the selected endpoint returned from the server, is [envoy subsets](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/subsets) where host metadata for subset load balancing must be placed under `envoy.lb`. ## Matching An InferenceModel From 342976c9ceb7ab119f0a2a5a81ae8fda8418d3bf Mon Sep 17 00:00:00 2001 From: ahg-g Date: Wed, 12 Mar 2025 20:22:45 +0000 Subject: [PATCH 5/9] Addressed comments --- docs/proposals/004-endpoint-picker-protocol/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/proposals/004-endpoint-picker-protocol/README.md b/docs/proposals/004-endpoint-picker-protocol/README.md index c29d5cccc..9c83bf416 100644 --- a/docs/proposals/004-endpoint-picker-protocol/README.md +++ b/docs/proposals/004-endpoint-picker-protocol/README.md @@ -41,7 +41,7 @@ dynamicMetadata: { ``` Constraints: -- If the EPP did not communicate the server endpoint via these two methods, it MUST return an error. +- If the EPP did not communicate the server endpoint via these two methods, it MUST return a 429 status code. - The EPP MUST not set two different values in the header and the inner response metadata value. - Setting different value leads to unpredictable behavior because proxies aren't guaranteed to support both paths, and so this protocol does not define what takes precedence. @@ -57,7 +57,7 @@ dynamicMetadata: { ``` ### Why envoy.lb namespace as a default? -The `envoy.lb` namesapce is a predefined namespace used for subsetting. One common way to use the selected endpoint returned from the server, is [envoy subsets](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/subsets) where host metadata for subset load balancing must be placed under `envoy.lb`. +The `envoy.lb` namespace is a predefined namespace. One common way to use the selected endpoint returned from the server, is [envoy subsets](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/subsets) where host metadata for subset load balancing must be placed under `envoy.lb`. Note that this is not related to the subsetting feature discussed above, this is an enovy implementation detail. ## Matching An InferenceModel The model name of a request MUST match the `Sepc.ModelName` parameter of one of the `InferenceModels` referencing the `InferencePool` managed by the EPP. Otherwise, the EPP MUST return a 404 status code. From 3bd2ccadd88d1ff63b923a46ab8fb6be67be2726 Mon Sep 17 00:00:00 2001 From: ahg-g Date: Wed, 12 Mar 2025 21:22:30 +0000 Subject: [PATCH 6/9] Addressed comments 2 --- docs/proposals/004-endpoint-picker-protocol/README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/proposals/004-endpoint-picker-protocol/README.md b/docs/proposals/004-endpoint-picker-protocol/README.md index 9c83bf416..69c960f6a 100644 --- a/docs/proposals/004-endpoint-picker-protocol/README.md +++ b/docs/proposals/004-endpoint-picker-protocol/README.md @@ -41,7 +41,9 @@ dynamicMetadata: { ``` Constraints: -- If the EPP did not communicate the server endpoint via these two methods, it MUST return a 429 status code. +- If the EPP did not communicate the server endpoint via these two methods, it MUST return an error as follows: + - 503 (Serivce Unavailable) if there are no ready endpoints. + - 429 (Too Many Requests) if the request should be dropped (e.g., a Sheddable request, and the servers under heavy load). - The EPP MUST not set two different values in the header and the inner response metadata value. - Setting different value leads to unpredictable behavior because proxies aren't guaranteed to support both paths, and so this protocol does not define what takes precedence. From 8ef8d87c81ab0ece9a9a98a4788c70595200de4e Mon Sep 17 00:00:00 2001 From: ahg-g Date: Thu, 13 Mar 2025 13:45:03 +0000 Subject: [PATCH 7/9] typo --- docs/proposals/004-endpoint-picker-protocol/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/proposals/004-endpoint-picker-protocol/README.md b/docs/proposals/004-endpoint-picker-protocol/README.md index 69c960f6a..3f6af44bd 100644 --- a/docs/proposals/004-endpoint-picker-protocol/README.md +++ b/docs/proposals/004-endpoint-picker-protocol/README.md @@ -62,4 +62,4 @@ dynamicMetadata: { The `envoy.lb` namespace is a predefined namespace. One common way to use the selected endpoint returned from the server, is [envoy subsets](https://www.envoyproxy.io/docs/envoy/latest/intro/arch_overview/upstream/load_balancing/subsets) where host metadata for subset load balancing must be placed under `envoy.lb`. Note that this is not related to the subsetting feature discussed above, this is an enovy implementation detail. ## Matching An InferenceModel -The model name of a request MUST match the `Sepc.ModelName` parameter of one of the `InferenceModels` referencing the `InferencePool` managed by the EPP. Otherwise, the EPP MUST return a 404 status code. +The model name of a request MUST match the `Spec.ModelName` parameter of one of the `InferenceModels` referencing the `InferencePool` managed by the EPP. Otherwise, the EPP MUST return a 404 status code. From d29be3a1a82942b63ddfc02b771994b431ba924f Mon Sep 17 00:00:00 2001 From: ahg-g Date: Thu, 13 Mar 2025 13:57:44 +0000 Subject: [PATCH 8/9] clarified that errors must be returned using immediate reponse --- docs/proposals/004-endpoint-picker-protocol/README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/proposals/004-endpoint-picker-protocol/README.md b/docs/proposals/004-endpoint-picker-protocol/README.md index 3f6af44bd..2109201fc 100644 --- a/docs/proposals/004-endpoint-picker-protocol/README.md +++ b/docs/proposals/004-endpoint-picker-protocol/README.md @@ -20,7 +20,7 @@ filterMetadata: { } ``` -If the key `x-gateway-destination-endpoint-subset` is set, the EPP MUST only select endpoints from the specified list. If none of the endpoints in the list is eligible or the list is empty, then the EPP MUST return a 429 status code. If the EPP does not select from the list, then this leads to unpredictable behavior. +If the key `x-gateway-destination-endpoint-subset` is set, the EPP MUST only select endpoints from the specified list. If none of the endpoints in the list is eligible or the list is empty, then the EPP MUST return a [ImmediateResponse](https://github.com/envoyproxy/envoy/blob/f2023ef77bdb4abaf9feef963c9a0c291f55568f/api/envoy/service/ext_proc/v3/external_processor.proto#L195) with 429 (Too Many Requests) HTTP status code. If the EPP does not select from the list, then this leads to unpredictable behavior. If the key `x-gateway-destination-endpoint-subset` is not set, then the EPP MUST select from the set defined by the `InferencePool` selector. @@ -42,8 +42,8 @@ dynamicMetadata: { Constraints: - If the EPP did not communicate the server endpoint via these two methods, it MUST return an error as follows: - - 503 (Serivce Unavailable) if there are no ready endpoints. - - 429 (Too Many Requests) if the request should be dropped (e.g., a Sheddable request, and the servers under heavy load). + - [ImmediateResponse](https://github.com/envoyproxy/envoy/blob/f2023ef77bdb4abaf9feef963c9a0c291f55568f/api/envoy/service/ext_proc/v3/external_processor.proto#L195) with 503 (Serivce Unavailable) HTTP status code if there are no ready endpoints. + - [ImmediateResponse](https://github.com/envoyproxy/envoy/blob/f2023ef77bdb4abaf9feef963c9a0c291f55568f/api/envoy/service/ext_proc/v3/external_processor.proto#L195) with 429 (Too Many Requests) HTTP status code if the request should be dropped (e.g., a Sheddable request, and the servers under heavy load). - The EPP MUST not set two different values in the header and the inner response metadata value. - Setting different value leads to unpredictable behavior because proxies aren't guaranteed to support both paths, and so this protocol does not define what takes precedence. From 41d54757027726980a14ceb2aa7b7947e04c8f74 Mon Sep 17 00:00:00 2001 From: ahg-g Date: Thu, 13 Mar 2025 14:01:01 +0000 Subject: [PATCH 9/9] updated status code --- docs/proposals/004-endpoint-picker-protocol/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/proposals/004-endpoint-picker-protocol/README.md b/docs/proposals/004-endpoint-picker-protocol/README.md index 2109201fc..5280e05cb 100644 --- a/docs/proposals/004-endpoint-picker-protocol/README.md +++ b/docs/proposals/004-endpoint-picker-protocol/README.md @@ -20,7 +20,7 @@ filterMetadata: { } ``` -If the key `x-gateway-destination-endpoint-subset` is set, the EPP MUST only select endpoints from the specified list. If none of the endpoints in the list is eligible or the list is empty, then the EPP MUST return a [ImmediateResponse](https://github.com/envoyproxy/envoy/blob/f2023ef77bdb4abaf9feef963c9a0c291f55568f/api/envoy/service/ext_proc/v3/external_processor.proto#L195) with 429 (Too Many Requests) HTTP status code. If the EPP does not select from the list, then this leads to unpredictable behavior. +If the key `x-gateway-destination-endpoint-subset` is set, the EPP MUST only select endpoints from the specified list. If none of the endpoints in the list is eligible or the list is empty, then the EPP MUST return a [ImmediateResponse](https://github.com/envoyproxy/envoy/blob/f2023ef77bdb4abaf9feef963c9a0c291f55568f/api/envoy/service/ext_proc/v3/external_processor.proto#L195) with 503 (Service Unavailable) HTTP status code. If the EPP does not select from the list, then this leads to unpredictable behavior. If the key `x-gateway-destination-endpoint-subset` is not set, then the EPP MUST select from the set defined by the `InferencePool` selector.