You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: site-src/guides/index.md
+1Lines changed: 1 addition & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -36,6 +36,7 @@ This quickstart guide is intended for engineers familiar with k8s and model serv
36
36
37
37
#### CPU-Based Model Server
38
38
39
+
This setup is using the formal `vllm-cpu` image, which according to the documentation can run vLLM on x86 CPU platform.
39
40
For this setup, we use approximately 9.5GB of memory and 12 CPUs for each replica.
40
41
While it is possible to deploy the model server with less resources, this is not recommended.
41
42
For example, in our tests, loading the model using 8GB of memory and 1 CPU was possible but took almost 3.5 minutes and inference requests took unreasonable time.
0 commit comments