You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Update default target-pod and inject it into response metadata (#270)
* Update default target-pod and inject it into response metadata
* Addressing comments round 1
* Update the endpoint picker proposal
* define the behavior when the two values differ
For each HTTP request, the EPP MUST communicate to the proxy the picked model server endpoint, via
15
-
adding the `target-pod` HTTP header in the request, or otherwise return an error.
15
+
adding the `x-gateway-destination-endpoint` HTTP header in the request and as an unstructured entry in the [dynamic_metadata](https://github.com/envoyproxy/go-control-plane/blob/c19bf63a811c90bf9e02f8e0dc1dcef94931ebb4/envoy/service/ext_proc/v3/external_processor.pb.go#L320) field of the ext-proc response, or otherwise return an error. The EPP MUST not set two different values in the header and the response metadata.
16
+
Setting different value leads to unpredictable behavior because proxies aren't guaranteed to support both paths, and so this protocol does not define what takes precedence.
16
17
17
18
## Model Server Protocol
18
19
@@ -62,4 +63,4 @@ The model server MUST expose the following LoRA adapter metrics via the same Pro
62
63
Requests will be queued if the model server has reached MaxActiveAdapter and canno load the
63
64
requested adapter. Example: `"max_lora": "8"`.
64
65
*`running_lora_adapters`: A comma separated list of adapters that are currently loaded in GPU
65
-
memory and ready to serve requests. Example: `"running_lora_adapters": "adapter1, adapter2"`
66
+
memory and ready to serve requests. Example: `"running_lora_adapters": "adapter1, adapter2"`
0 commit comments