Skip to content

robscott/gateway-api-inference-extension

This branch is 1 commit ahead of, 377 commits behind kubernetes-sigs/gateway-api-inference-extension:main.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

e1218a9 · Jan 13, 2025
Nov 29, 2024
Jan 6, 2025
Jan 7, 2025
Jan 3, 2025
Dec 11, 2024
Jan 3, 2025
Jan 7, 2025
Jan 10, 2025
Oct 23, 2024
Jan 13, 2025
Jan 7, 2025
Jan 13, 2025
Oct 21, 2024
Dec 19, 2024
Dec 26, 2024
Aug 28, 2024
Dec 21, 2024
Aug 28, 2024
Jan 13, 2025
Nov 25, 2024
Jan 7, 2025
Jan 7, 2025
Aug 28, 2024
Aug 28, 2024
Aug 28, 2024
Dec 20, 2024
Aug 28, 2024
Jan 13, 2025
Jan 13, 2025
Jan 13, 2025
Jan 13, 2025
Dec 19, 2024

Repository files navigation

Gateway API Inference Extension

The Gateway API Inference Extension came out of wg-serving and is sponsored by SIG Network. This repo contains: the load balancing algorithm, ext-proc code, CRDs, and controllers of the extension.

This extension is intented to provide value to multiplexed LLM services on a shared pool of compute. See the proposal for more info.

Status

This project is currently in development.

For more rapid testing, our PoC is in the ./examples/ dir.

Getting Started

Install the CRDs into the cluster:

make install

Delete the APIs(CRDs) from the cluster:

make uninstall

Deploying the ext-proc image Refer to this README on how to deploy the Ext-Proc image.

Contributing

Our community meeting is weekly at Th 10AM PDT; zoom link here.

We currently utilize the #wg-serving slack channel for communications.

Contributions are readily welcomed, thanks for joining us!

Code of conduct

Participation in the Kubernetes community is governed by the Kubernetes Code of Conduct.

About

Gateway API Inference Extension

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 57.4%
  • Go 34.0%
  • Python 6.3%
  • Shell 0.9%
  • Makefile 0.9%
  • HTML 0.2%
  • Other 0.3%