Skip to content

Commit 102d0ce

Browse files
committed
Update extension-policy to match the new epp service name
1 parent a591cd0 commit 102d0ce

File tree

2 files changed

+33
-32
lines changed

2 files changed

+33
-32
lines changed

config/manifests/gateway/extension_policy.yaml

Lines changed: 0 additions & 32 deletions
This file was deleted.

config/manifests/inferencepool.yaml

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,39 @@ spec:
7575
initialDelaySeconds: 5
7676
periodSeconds: 10
7777
---
78+
apiVersion: gateway.envoyproxy.io/v1alpha1
79+
kind: EnvoyExtensionPolicy
80+
metadata:
81+
name: ext-proc-policy
82+
namespace: default
83+
spec:
84+
extProc:
85+
- backendRefs:
86+
- group: ""
87+
kind: Service
88+
name: vllm-llama2-7b-epp
89+
port: 9002
90+
processingMode:
91+
allowModeOverride: true
92+
request:
93+
body: Buffered
94+
response:
95+
# The timeouts are likely not needed here. We can experiment with removing/tuning them slowly.
96+
# The connection limits are more important and will cause the opaque: ext_proc_gRPC_error_14 error in Envoy GW if not configured correctly.
97+
messageTimeout: 1000s
98+
backendSettings:
99+
circuitBreaker:
100+
maxConnections: 40000
101+
maxPendingRequests: 40000
102+
maxParallelRequests: 40000
103+
timeout:
104+
tcp:
105+
connectTimeout: 24h
106+
targetRef:
107+
group: gateway.networking.k8s.io
108+
kind: HTTPRoute
109+
name: llm-route
110+
---
78111
kind: ClusterRole
79112
apiVersion: rbac.authorization.k8s.io/v1
80113
metadata:

0 commit comments

Comments
 (0)