Improve vLLM upstream health checks to only pass when models are servable #558

smarterclayton · 2025-03-21T14:57:33Z

As documented in #550, the default vLLM configuration could be improved and documented better. A startupProbe on /health is the right default for vLLM given it does not load the server until a very long model load is complete, but tunables may vary.

kfswain · 2025-04-22T15:40:15Z

I believe #550 actually resolved this:

gateway-api-inference-extension/config/manifests/vllm/gpu-deployment.yaml

Line 137 in c546506

# vLLM does not start the OpenAI server (and hence make /health available)

smarterclayton mentioned this issue Mar 21, 2025

Configure the vllm deployment with best practices for startup #550

Merged

kfswain closed this as completed Apr 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve vLLM upstream health checks to only pass when models are servable #558

Improve vLLM upstream health checks to only pass when models are servable #558

smarterclayton commented Mar 21, 2025 •

edited

Loading

kfswain commented Apr 22, 2025

Uh oh!

Improve vLLM upstream health checks to only pass when models are servable #558

Improve vLLM upstream health checks to only pass when models are servable #558

Comments

smarterclayton commented Mar 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

kfswain commented Apr 22, 2025

Uh oh!

smarterclayton commented Mar 21, 2025 •

edited

Loading