Skip to content

Commit 34862ab

Browse files
authored
docs: add the Hugging Face secret to readme (#139)
Signed-off-by: Kay Yan <[email protected]>
1 parent eefcaf7 commit 34862ab

File tree

1 file changed

+10
-5
lines changed

1 file changed

+10
-5
lines changed

pkg/README.md

+10-5
Original file line numberDiff line numberDiff line change
@@ -7,20 +7,26 @@ The current manifests rely on Envoy Gateway [v1.2.1](https://gateway.envoyproxy.
77

88
1. **Deploy Sample vLLM Application**
99

10-
A sample vLLM deployment with the proper protocol to work with LLM Instance Gateway can be found [here](https://github.com/kubernetes-sigs/llm-instance-gateway/tree/main/examples/poc/manifests/vllm/vllm-lora-deployment.yaml#L18).
10+
Create a Hugging Face secret to download the model [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf). Ensure that the token grants access to this model.
11+
Deploy a sample vLLM deployment with the proper protocol to work with the LLM Instance Gateway.
12+
```bash
13+
kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2
14+
kubectl apply -f ../examples/poc/manifests/vllm/vllm-lora-deployment.yaml
15+
```
1116

1217
1. **Deploy InferenceModel and InferencePool**
1318

14-
You can find a sample InferenceModel and InferencePool configuration, based on the vLLM deployments mentioned above, [here](https://github.com/kubernetes-sigs/llm-instance-gateway/tree/main/examples/poc/manifests/inferencepool-with-model.yaml).
15-
19+
Deploy a sample InferenceModel and InferencePool configuration based on the vLLM deployments mentioned above.
20+
```bash
21+
kubectl apply -f ../examples/poc/manifests/inferencepool-with-model.yaml
22+
```
1623

1724
1. **Update Envoy Gateway Config to enable Patch Policy**
1825

1926
Our custom LLM Gateway ext-proc is patched into the existing envoy gateway via `EnvoyPatchPolicy`. To enable this feature, we must extend the Envoy Gateway config map. To do this, simply run:
2027
```bash
2128
kubectl apply -f ./manifests/enable_patch_policy.yaml
2229
kubectl rollout restart deployment envoy-gateway -n envoy-gateway-system
23-
2430
```
2531
Additionally, if you would like to enable the admin interface, you can uncomment the admin lines and run this again.
2632

@@ -54,7 +60,6 @@ The current manifests rely on Envoy Gateway [v1.2.1](https://gateway.envoyproxy.
5460
}'
5561
```
5662

57-
5863
## Scheduling Package in Ext Proc
5964
The scheduling package implements request scheduling algorithms for load balancing requests across backend pods in an inference gateway. The scheduler ensures efficient resource utilization while maintaining low latency and prioritizing critical requests. It applies a series of filters based on metrics and heuristics to select the best pod for a given request.
6065

0 commit comments

Comments
 (0)