Skip to content

Bugs: Follow user guide ran into 500 internal server error when sending inference request #786

Closed
@capri-xiyue

Description

@capri-xiyue

What happened:
When I followed https://gateway-api-inference-extension.sigs.k8s.io/guides/#__tabbed_2_2 for the cpu deployment, eventually when I ran

IP=$(kubectl get gateway/inference-gateway -o jsonpath='{.status.addresses[0].value}')
PORT=80

curl -i ${IP}:${PORT}/v1/completions -H 'Content-Type: application/json' -d '{
"model": "Qwen/Qwen2.5-1.5B-Instruct",
"prompt": "Write as if you were a critic: San Francisco",
"max_tokens": 100,
"temperature": 0
}'

It showed 500 internal server error.

What you expected to happen:
Should return valid response

How to reproduce it (as minimally and precisely as possible):
Followed https://gateway-api-inference-extension.sigs.k8s.io/guides/#__tabbed_2_2 for the cpu deployment

Anything else we need to know?:
I debugged it, I think the error is because of the change like 2f72a8a#diff-cabeb9ea1c075199163242f9adca3bdaad2cf1dda1aafb600c5d57885066e471, the epp deployment showed error 2025-05-06T20:41:17Z LEVEL(-2) health epp/health.go:38 gRPC health check requested unknown service {"available-services": ["envoy.service.ext_proc.v3.ExternalProcessor"], "requested-service": ""}. Looks like something is mismatching. I guess it is because https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/config/manifests/inferencepool-resources.yaml#L51 is still using the old image which points to old inference-extension service?

Seems like main branch is broken now.
Things got fixed when I ran helm install vllm-llama3-8b-instruct \ --set inferencePool.modelServers.matchLabels.app=vllm-llama3-8b-instruct \ --set provider.name=gke \ --version v0.3.0 \ oci://registry.k8s.io/gateway-api-inference-extension/charts/inferencepool instead of running from the main

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugCategorizes issue or PR as related to a bug.needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions