-
Notifications
You must be signed in to change notification settings - Fork 69
Simplify POC installation #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Welcome @liu-cong! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with the assumption that this has been validated E2E
apiVersion: gateway.networking.k8s.io/v1 | ||
kind: Gateway | ||
apiVersion: apps/v1 | ||
kind: Deployment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
gatewayclass as the file name may be misleading if we are putting the ext_proc deployment yaml here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is that ext_proc is an implementation detail of this specific gateway class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A reasonable perspective.
However, the core of this PoC is the ext-proc implementation, and I don't think can be considered an implementation detail here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about just call it installation.yaml which just bundles everything, or can break it down to multiple yaml files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either SGTM, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, renamed to installation.yaml. And the README explains what's being installed so this should be good!
Yes both @courageJ and I tried this. |
fc9ce95
to
2097027
Compare
/lgtm |
Wait until the gateway is ready. | ||
```bash | ||
IP=$(kubectl get gateway/llm-gateway -o jsonpath='{.status.addresses[0].value}') | ||
PORT=8081 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this wrong, llm-instance-gw is listening on 8080. Am I wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! Actually in the POC setup, Envoy is configured the additional 8081 port for ext proc traffic. Updated README
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ho yes, you're right actually. Sorry, for the confusion.
Overall it looks good to me and it's a great addition. Thank you @liu-cong 😀 |
Update README on envoy 8081 port
/lgtm |
@@ -1,68 +1,55 @@ | |||
# Envoy Ext Proc Gateway with LoRA Integration | |||
|
|||
This project sets up an Envoy gateway to handle gRPC calls with integration of LoRA (Low-Rank Adaptation). The configuration aims to manage gRPC traffic through Envoy's external processing and custom routing based on headers and load balancing rules. The setup includes Kubernetes services and deployments for both the gRPC server and the vllm-lora application. | |||
This project sets up an Envoy gateway with a custom external processing which implements advanced routing logic tailored for LoRA (Low-Rank Adaptation) adapters. The routing algorithm is based on the model specified (using Open AI API format), and ensuring efficient load balancing based on model server metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one more extra space between which
and implements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: liu-cong, terrytangyuan The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This was adapted from the private POC repo.
This PR separates what needs to installed vs. what resources user creates as sample applications. The installation is simplified to a single gatewayclass.yaml file.
@courageJ encountered some issue when trying the POC, and this PR worked for her. Not sure what didn't work before though :)