You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For each HTTP request, the EPP MUST communicate to the proxy the picked model server endpoint, via
15
-
adding the `target-pod` HTTP header in the request, or otherwise return an error.
15
+
adding the `x-gateway-destination-endpoint` HTTP header in the request and as an unstructured entry in the [dynamic_metadata](https://github.com/envoyproxy/go-control-plane/blob/c19bf63a811c90bf9e02f8e0dc1dcef94931ebb4/envoy/service/ext_proc/v3/external_processor.pb.go#L320) field of the ext-proc response, or otherwise return an error.
16
16
17
17
## Model Server Protocol
18
18
@@ -62,4 +62,4 @@ The model server MUST expose the following LoRA adapter metrics via the same Pro
62
62
Requests will be queued if the model server has reached MaxActiveAdapter and canno load the
63
63
requested adapter. Example: `"max_lora": "8"`.
64
64
*`running_lora_adapters`: A comma separated list of adapters that are currently loaded in GPU
65
-
memory and ready to serve requests. Example: `"running_lora_adapters": "adapter1, adapter2"`
65
+
memory and ready to serve requests. Example: `"running_lora_adapters": "adapter1, adapter2"`
0 commit comments