-
Notifications
You must be signed in to change notification settings - Fork 74
vLLM Fork: RuntimeError: CUDA error #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@terrytangyuan @kfswain Could you please help us with this? Thanks! |
We should be ready to switch to latest vLLM this week, #22 should fix this |
/close This should already be fixed. We switched to upstream vllm image. |
@liu-cong: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Minor fixes to enable image building matching GIE
When running the PoC vLLM fork on a
g2-standard-48
machine in GKE, and calling the/v1/completions
api directly (not via proxy), an internal server error is returned:The vLLM container logs show the following error:
When running a non-forked image
vllm/vllm-openai
in the same environment, the api calls succeeds.The text was updated successfully, but these errors were encountered: