Description
What happened:
When I followed https://gateway-api-inference-extension.sigs.k8s.io/guides/#__tabbed_2_2 for the cpu deployment, eventually when I ran
IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}')
PORT=80
curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
"model": "Qwen/Qwen2.5-1.5B-Instruct",
"prompt": "Write as if you were a critic: San Francisco",
"max_tokens": 100,
"temperature": 0
}'
It showed 500 internal server error.
What you expected to happen:
Should return valid response
How to reproduce it (as minimally and precisely as possible):
Followed https://gateway-api-inference-extension.sigs.k8s.io/guides/#__tabbed_2_2 for the cpu deployment
Anything else we need to know?:
I debugged it, I think the error is because of the change like 2f72a8a#diff-cabeb9ea1c075199163242f9adca3bdaad2cf1dda1aafb600c5d57885066e471, the epp
deployment showed error 2025-05-06T20:41:17Z LEVEL(-2) health epp/health.go:38 gRPC health check requested unknown service {"available-services": ["envoy.service.ext_proc.v3.ExternalProcessor"], "requested-service": ""}
. Looks like something is mismatching. I guess it is because https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/config/manifests/inferencepool-resources.yaml#L51 is still using the old image which points to old inference-extension
service?
Seems like main branch is broken now.
Things got fixed when I ran helm install vllm-llama3-8b-instruct \ --set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \ --set provider.name=gke \ --version v0.3.0 \ oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool
instead of running from the main