Skip to content

Simplify POC installation #8

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 29, 2024
Merged

Conversation

liu-cong
Copy link
Contributor

@liu-cong liu-cong commented Sep 26, 2024

This was adapted from the private POC repo.

This PR separates what needs to installed vs. what resources user creates as sample applications. The installation is simplified to a single gatewayclass.yaml file.

@courageJ encountered some issue when trying the POC, and this PR worked for her. Not sure what didn't work before though :)

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 26, 2024
@k8s-ci-robot
Copy link
Contributor

Welcome @liu-cong!

It looks like this is your first PR to kubernetes-sigs/llm-instance-gateway 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/llm-instance-gateway has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Sep 26, 2024
Copy link
Collaborator

@kfswain kfswain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with the assumption that this has been validated E2E

apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
apiVersion: apps/v1
kind: Deployment
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gatewayclass as the file name may be misleading if we are putting the ext_proc deployment yaml here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point is that ext_proc is an implementation detail of this specific gateway class

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reasonable perspective.

However, the core of this PoC is the ext-proc implementation, and I don't think can be considered an implementation detail here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about just call it installation.yaml which just bundles everything, or can break it down to multiple yaml files

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either SGTM, thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, renamed to installation.yaml. And the README explains what's being installed so this should be good!

@liu-cong
Copy link
Contributor Author

LGTM with the assumption that this has been validated E2E

Yes both @courageJ and I tried this.

@liu-cong liu-cong force-pushed the main branch 2 times, most recently from fc9ce95 to 2097027 Compare September 26, 2024 21:17
@kfswain
Copy link
Collaborator

kfswain commented Sep 26, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 26, 2024
Wait until the gateway is ready.
```bash
IP=$(kubectl get gateway/llm-gateway -o jsonpath='{.status.addresses[0].value}')
PORT=8081
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this wrong, llm-instance-gw is listening on 8080. Am I wrong?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! Actually in the POC setup, Envoy is configured the additional 8081 port for ext proc traffic. Updated README

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ho yes, you're right actually. Sorry, for the confusion.

@Joffref
Copy link

Joffref commented Sep 26, 2024

Overall it looks good to me and it's a great addition. Thank you @liu-cong 😀

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 27, 2024
Update README on envoy 8081 port
@kfswain
Copy link
Collaborator

kfswain commented Sep 27, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 27, 2024
@@ -1,68 +1,55 @@
# Envoy Ext Proc Gateway with LoRA Integration

This project sets up an Envoy gateway to handle gRPC calls with integration of LoRA (Low-Rank Adaptation). The configuration aims to manage gRPC traffic through Envoy's external processing and custom routing based on headers and load balancing rules. The setup includes Kubernetes services and deployments for both the gRPC server and the vllm-lora application.
This project sets up an Envoy gateway with a custom external processing which implements advanced routing logic tailored for LoRA (Low-Rank Adaptation) adapters. The routing algorithm is based on the model specified (using Open AI API format), and ensuring efficient load balancing based on model server metrics.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one more extra space between which and implements

Copy link
Member

@terrytangyuan terrytangyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liu-cong, terrytangyuan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 29, 2024
@k8s-ci-robot k8s-ci-robot merged commit 947e44d into kubernetes-sigs:main Sep 29, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants