Skip to content

Complete the InferencePool documentation #673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Apr 23, 2025

Conversation

nicolexin
Copy link
Contributor

  1. Added section for inference pool configuration and example.
  2. Have a subsection for overlap with Service
  3. Added section for replacing an inference pool / base model
  4. Added extensionReference to the inference pool spec

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 9, 2025
@k8s-ci-robot
Copy link
Contributor

Hi @nicolexin. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot requested review from ahg-g and danehans April 9, 2025 21:17
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Apr 9, 2025
Copy link

netlify bot commented Apr 9, 2025

Deploy Preview for gateway-api-inference-extension ready!

Name Link
🔨 Latest commit 4fdfadf
🔍 Latest deploy log https://app.netlify.com/sites/gateway-api-inference-extension/deploys/6802de2593a71f0008779ab5
😎 Deploy Preview https://deploy-preview-673--gateway-api-inference-extension.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@nirrozenbaum
Copy link
Contributor

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 10, 2025
@kfswain
Copy link
Collaborator

kfswain commented Apr 10, 2025

/approve

This LGTM, but will let the previous reviewers give the final stamp. Thanks!

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 10, 2025
Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nicolexin, this is great!


It is expected for the InferencePool to:
An InferencePool is expected to be bundled with an [Endpoint Picker](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/pkg/epp) extension. This extension is responsible for tracking key metrics on each model server (i.e. the KV-cache utilization, queue length of pending requests, active LoRA adapters, etc.) and routing incoming inference requests to the optimal model server replica based on these metrics.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can an EPP be associated with multiple pools?
Can a Pod belong to more than one pool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes EPP is associated with the inferencepool, so it is associated with multiple pods covered by this inference pool. You can have one pod belong to multiple pools.

Copy link

@elevran elevran Apr 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @nicolexin.
I wanted to clarify that a single EPP can be associated with multiple InferencePools, not Pods.
Is the mental model for Pool and EPP supporting that and what changes would need to be done in the EPP to support dispatching to the right Pool (e.g., based on the model name present in the request body)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry about misreading your first question. Yes a single EPP can be associated with multiple InferencePools.

The mental model depends on how you define your gateway, for example:

  1. You can configure two separate gateways with different frontend IP, each backed by a different inference pool. Both of the pool sharing the same EPP. In this case if you hit one of the gateway IP, the callout extension should only send available endpoints belong to that gateway to the EPP, and EPP will choose an optimal endpoint based on that.
  2. Similarly, you could have one gateway but separate HTTPRoute/rules, so that on one path (e.g. /v1) requests go to the v1 inference pool, and on the other path (e.g. /v2) requests go to the v2 inference pool. Again in this case the callout extension should only send available endpoints belong to that particular routing rules to the EPP based on where the request was sent.
  3. If there are multiple inference pools sharing the same serving path then EPP will received a list of endpoints belong to all of these inference pools. EPP needs to probe each of the model server endpoints for available model names and choose one of the endpoints based on the model name present in the request body. (Note that this is a common use case even if you want to support just one InferencePool with a common based model + LoRA adapters)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes a single EPP can be associated with multiple InferencePools.

An EPP can only be associated with a single InferencePool. The associated InferencePool is specified by the poolName and poolNamespace flags. An HTTPRoute can have multiple backendRefs that reference the same InferencePool and therefore routes to the same EPP. An HTTPRoute can have multiple backendRefs that reference different InferencePools and therefore routes to different EPPs. The implementation guide states:

InferencePool represents a set of Inference-focused Pods and an extension that will be used to route to them.

The " Inference-focused Pods" define the pool of model servers that share common attributes, and "an extension" defines the EPP responsible for picking which model server in the pool the request should be routed to.

@nicolexin this question has surfaced multiple times recently, so you may want to include documentation in this PR that resolves this confusion.

xref: #145

Copy link
Collaborator

@kfswain kfswain Apr 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multi-tenancy comes up a lot. I think at minimum we should be very cautious about it, and potentially even advise against it.

Multi-tenancy would make a single point of failure for multiple sets of accelerators(pools), so an outage would be incredibly painful (loss of revenue + cost of essentially all their expensive accelerators).

Multi-tenancy also puts pressure on any scale issues we may come across, which we currently do not have a good read on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but just by looking at our InferencePool API, I don't think other implementations are restricted by one EPP associated with a single inference pool.

PTAL at #145 referenced above. InferencePool can only reference a single EPP config, e.g. Service. On the EPP side, it can only take a single poolName and poolNamespace flag. This is a singular bi-directional binding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the EPP side, it can only take a single poolName and poolNamespace flag

I'd argue that this is an implementation detail of EPP, and not necessarily a long term or intentional limitation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue that this is an implementation detail of EPP, and not necessarily a long term or intentional limitation.

I agree, and that's why I created #252 to provide additional flexibility for the EPP<>InferencePool(s) binding. My comments are specific to what is capable today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the revised wording is great here, thanks @nicolexin! It's reflective of what we have today, and we can use #252 to follow up on more flexible mapping. I think everything else has been resolved, and this is a huge improvement to our docs, so I think we should go ahead and merge.

Copy link
Member

@robscott robscott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nicolexin!


- An InferencePool named `vllm-llama3-8b-instruct` is created in the `default` namespace.
- It will select Pods that have the label `app: vllm-llama3-8b-instruct`.
- Traffic routed to this InferencePool will call out to the EPP service `vllm-llama3-8b-instruct-epp` on port `9002` for making routing decisions. If EPP fails to pick an endpoint, or is not responsive, the request will be dropped.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a few of these comments got hidden by GitHub UI, hopefully this helps keep them from disappearing.

- An InferencePool named `vllm-llama3-8b-instruct` is created in the `default` namespace.
- It will select Pods that have the label `app: vllm-llama3-8b-instruct`.
- Traffic routed to this InferencePool will call out to the EPP service `vllm-llama3-8b-instruct-epp` on port `9002` for making routing decisions. If EPP fails to pick an endpoint, or is not responsive, the request will be dropped.
- Traffic routed to this InferencePool will be forwarded to the port `8000` on the selected Pods.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a few of these comments got hidden by GitHub UI, hopefully this helps keep them from disappearing.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: kfswain, nicolexin, robscott

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@robscott
Copy link
Member

Thanks for all the work on this @nicolexin, this is great!

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 23, 2025
@k8s-ci-robot k8s-ci-robot merged commit 7d238dd into kubernetes-sigs:main Apr 23, 2025
8 checks passed
rlakhtakia pushed a commit to rlakhtakia/gateway-api-inference-extension that referenced this pull request Apr 25, 2025
* Initial guide for inference pool

* Add extensionReference to the InferencePool spec

* Fix list formatting

* Remove unused labels

* Autogenerate the spec

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Rename llm-pool names in rollout example

* Add use cases for replacing an inference pool

* Rewording the background section

* Create replacing-inference-pool.md

* Replace instructions with a link for how to replace an inference pool

* Update replacing-inference-pool.md

* Update mkdocs.yml

* Update replacing-inference-pool.md

* Update inferencemodel_types.go

* Update inferencepool.md

* Update site-src/guides/replacing-inference-pool.md

Co-authored-by: Rob Scott <[email protected]>

---------

Co-authored-by: Rob Scott <[email protected]>
k8s-ci-robot pushed a commit that referenced this pull request Apr 25, 2025
* Add unit test coverage for pod APIs under datastore/pkg

* Add unit test coverage for pod APIs under datastore/pkg

* Add unit test coverage for pod APIs under datastore/pkg

* Add unit test coverage for pod APIs under datastore/pkg

* EPP Architecture proposal (#683)

* initial changes

* Adding to proposal to give a quick barebones definition to refactor

* feedback changes

* more feedback addressing

* removed unused Fake struct (#723)

Signed-off-by: Nir Rozenbaum <[email protected]>

* epp: return correct response for trailers (#726)

This looks like a copy paste error.

* Refactor scheduler to run plugins (#677)

* Refactor scheduler to run plugins

* Add scheduler plugin latency metric

* Address comments

* Address comments

* Complete the InferencePool documentation (#673)

* Initial guide for inference pool

* Add extensionReference to the InferencePool spec

* Fix list formatting

* Remove unused labels

* Autogenerate the spec

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Rename llm-pool names in rollout example

* Add use cases for replacing an inference pool

* Rewording the background section

* Create replacing-inference-pool.md

* Replace instructions with a link for how to replace an inference pool

* Update replacing-inference-pool.md

* Update mkdocs.yml

* Update replacing-inference-pool.md

* Update inferencemodel_types.go

* Update inferencepool.md

* Update site-src/guides/replacing-inference-pool.md

Co-authored-by: Rob Scott <[email protected]>

---------

Co-authored-by: Rob Scott <[email protected]>

* reduce log level in metrics logger not to trash the log (#708)

* reduce log level in metrics logger not to trash the log

Signed-off-by: Nir Rozenbaum <[email protected]>

* rename flush metrics to refresh metrics

Signed-off-by: Nir Rozenbaum <[email protected]>

* revert log level

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>

* few updates in datastore (#713)

* few updates in datastore

Signed-off-by: Nir Rozenbaum <[email protected]>

* PoolSet documentation

Signed-off-by: Nir Rozenbaum <[email protected]>

* error phrasing

Signed-off-by: Nir Rozenbaum <[email protected]>

* removed unused pool arg from PodUpdateOrAddIfNotExist

Signed-off-by: Nir Rozenbaum <[email protected]>

* linter

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>

* scheduler refactoring (#730)

Signed-off-by: Nir Rozenbaum <[email protected]>

* filter irrelevant pod in pod_reconciler (#696)

* EPP: Update GetRandomPod() to return nil if no pods exist (#731)

Signed-off-by: Daneyon Hansen <[email protected]>

* Move filter and scorer plugins registration to a separate file (#729)

* Move filters and scorers registration to filter/scorer specific files

* Default scheduler config contains empty list of scorers

Signed-off-by: Maya Barnea <[email protected]>

* Default plugin is not a scorer any more

Signed-off-by: Maya Barnea <[email protected]>

* fix scheduler test + lint comments

Signed-off-by: Maya Barnea <[email protected]>

---------

Signed-off-by: Maya Barnea <[email protected]>

* Update issue templates (#738)

* Update issue templates

* Updates artifacts for v0.3.0-rc.1 release

Signed-off-by: Kellen Swain <[email protected]>

* Updates bbr chart for v0.3.0-rc.1 release

Signed-off-by: Kellen Swain <[email protected]>

* Updates artifacts for v0.3.0 release

Signed-off-by: Kellen Swain <[email protected]>

* Adding blank issue template so that all issues start with  label

---------

Signed-off-by: Kellen Swain <[email protected]>

* Add unit test coverage for pod APIs under datastore/pkg

* few updates in datastore (#713)

* few updates in datastore

Signed-off-by: Nir Rozenbaum <[email protected]>

* PoolSet documentation

Signed-off-by: Nir Rozenbaum <[email protected]>

* error phrasing

Signed-off-by: Nir Rozenbaum <[email protected]>

* removed unused pool arg from PodUpdateOrAddIfNotExist

Signed-off-by: Nir Rozenbaum <[email protected]>

* linter

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>

* few updates in datastore (#713)

* few updates in datastore

Signed-off-by: Nir Rozenbaum <[email protected]>

* PoolSet documentation

Signed-off-by: Nir Rozenbaum <[email protected]>

* error phrasing

Signed-off-by: Nir Rozenbaum <[email protected]>

* removed unused pool arg from PodUpdateOrAddIfNotExist

Signed-off-by: Nir Rozenbaum <[email protected]>

* linter

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>

* Add unit test coverage for pod APIs under datastore/pkg

---------

Signed-off-by: Nir Rozenbaum <[email protected]>
Signed-off-by: Daneyon Hansen <[email protected]>
Signed-off-by: Maya Barnea <[email protected]>
Signed-off-by: Kellen Swain <[email protected]>
Co-authored-by: Kellen Swain <[email protected]>
Co-authored-by: Nir Rozenbaum <[email protected]>
Co-authored-by: John Howard <[email protected]>
Co-authored-by: Cong Liu <[email protected]>
Co-authored-by: Nicole Xin <[email protected]>
Co-authored-by: Rob Scott <[email protected]>
Co-authored-by: nayihz <[email protected]>
Co-authored-by: Daneyon Hansen <[email protected]>
Co-authored-by: Maya Barnea <[email protected]>
kaushikmitr pushed a commit to kaushikmitr/llm-instance-gateway that referenced this pull request Apr 28, 2025
* Add unit test coverage for pod APIs under datastore/pkg

* Add unit test coverage for pod APIs under datastore/pkg

* Add unit test coverage for pod APIs under datastore/pkg

* Add unit test coverage for pod APIs under datastore/pkg

* EPP Architecture proposal (kubernetes-sigs#683)

* initial changes

* Adding to proposal to give a quick barebones definition to refactor

* feedback changes

* more feedback addressing

* removed unused Fake struct (kubernetes-sigs#723)

Signed-off-by: Nir Rozenbaum <[email protected]>

* epp: return correct response for trailers (kubernetes-sigs#726)

This looks like a copy paste error.

* Refactor scheduler to run plugins (kubernetes-sigs#677)

* Refactor scheduler to run plugins

* Add scheduler plugin latency metric

* Address comments

* Address comments

* Complete the InferencePool documentation (kubernetes-sigs#673)

* Initial guide for inference pool

* Add extensionReference to the InferencePool spec

* Fix list formatting

* Remove unused labels

* Autogenerate the spec

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Rename llm-pool names in rollout example

* Add use cases for replacing an inference pool

* Rewording the background section

* Create replacing-inference-pool.md

* Replace instructions with a link for how to replace an inference pool

* Update replacing-inference-pool.md

* Update mkdocs.yml

* Update replacing-inference-pool.md

* Update inferencemodel_types.go

* Update inferencepool.md

* Update site-src/guides/replacing-inference-pool.md

Co-authored-by: Rob Scott <[email protected]>

---------

Co-authored-by: Rob Scott <[email protected]>

* reduce log level in metrics logger not to trash the log (kubernetes-sigs#708)

* reduce log level in metrics logger not to trash the log

Signed-off-by: Nir Rozenbaum <[email protected]>

* rename flush metrics to refresh metrics

Signed-off-by: Nir Rozenbaum <[email protected]>

* revert log level

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>

* few updates in datastore (kubernetes-sigs#713)

* few updates in datastore

Signed-off-by: Nir Rozenbaum <[email protected]>

* PoolSet documentation

Signed-off-by: Nir Rozenbaum <[email protected]>

* error phrasing

Signed-off-by: Nir Rozenbaum <[email protected]>

* removed unused pool arg from PodUpdateOrAddIfNotExist

Signed-off-by: Nir Rozenbaum <[email protected]>

* linter

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>

* scheduler refactoring (kubernetes-sigs#730)

Signed-off-by: Nir Rozenbaum <[email protected]>

* filter irrelevant pod in pod_reconciler (kubernetes-sigs#696)

* EPP: Update GetRandomPod() to return nil if no pods exist (kubernetes-sigs#731)

Signed-off-by: Daneyon Hansen <[email protected]>

* Move filter and scorer plugins registration to a separate file (kubernetes-sigs#729)

* Move filters and scorers registration to filter/scorer specific files

* Default scheduler config contains empty list of scorers

Signed-off-by: Maya Barnea <[email protected]>

* Default plugin is not a scorer any more

Signed-off-by: Maya Barnea <[email protected]>

* fix scheduler test + lint comments

Signed-off-by: Maya Barnea <[email protected]>

---------

Signed-off-by: Maya Barnea <[email protected]>

* Update issue templates (kubernetes-sigs#738)

* Update issue templates

* Updates artifacts for v0.3.0-rc.1 release

Signed-off-by: Kellen Swain <[email protected]>

* Updates bbr chart for v0.3.0-rc.1 release

Signed-off-by: Kellen Swain <[email protected]>

* Updates artifacts for v0.3.0 release

Signed-off-by: Kellen Swain <[email protected]>

* Adding blank issue template so that all issues start with  label

---------

Signed-off-by: Kellen Swain <[email protected]>

* Add unit test coverage for pod APIs under datastore/pkg

* few updates in datastore (kubernetes-sigs#713)

* few updates in datastore

Signed-off-by: Nir Rozenbaum <[email protected]>

* PoolSet documentation

Signed-off-by: Nir Rozenbaum <[email protected]>

* error phrasing

Signed-off-by: Nir Rozenbaum <[email protected]>

* removed unused pool arg from PodUpdateOrAddIfNotExist

Signed-off-by: Nir Rozenbaum <[email protected]>

* linter

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>

* few updates in datastore (kubernetes-sigs#713)

* few updates in datastore

Signed-off-by: Nir Rozenbaum <[email protected]>

* PoolSet documentation

Signed-off-by: Nir Rozenbaum <[email protected]>

* error phrasing

Signed-off-by: Nir Rozenbaum <[email protected]>

* removed unused pool arg from PodUpdateOrAddIfNotExist

Signed-off-by: Nir Rozenbaum <[email protected]>

* linter

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>

* Add unit test coverage for pod APIs under datastore/pkg

---------

Signed-off-by: Nir Rozenbaum <[email protected]>
Signed-off-by: Daneyon Hansen <[email protected]>
Signed-off-by: Maya Barnea <[email protected]>
Signed-off-by: Kellen Swain <[email protected]>
Co-authored-by: Kellen Swain <[email protected]>
Co-authored-by: Nir Rozenbaum <[email protected]>
Co-authored-by: John Howard <[email protected]>
Co-authored-by: Cong Liu <[email protected]>
Co-authored-by: Nicole Xin <[email protected]>
Co-authored-by: Rob Scott <[email protected]>
Co-authored-by: nayihz <[email protected]>
Co-authored-by: Daneyon Hansen <[email protected]>
Co-authored-by: Maya Barnea <[email protected]>
kfswain pushed a commit to kfswain/llm-instance-gateway that referenced this pull request Apr 29, 2025
* Initial guide for inference pool

* Add extensionReference to the InferencePool spec

* Fix list formatting

* Remove unused labels

* Autogenerate the spec

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Rename llm-pool names in rollout example

* Add use cases for replacing an inference pool

* Rewording the background section

* Create replacing-inference-pool.md

* Replace instructions with a link for how to replace an inference pool

* Update replacing-inference-pool.md

* Update mkdocs.yml

* Update replacing-inference-pool.md

* Update inferencemodel_types.go

* Update inferencepool.md

* Update site-src/guides/replacing-inference-pool.md

Co-authored-by: Rob Scott <[email protected]>

---------

Co-authored-by: Rob Scott <[email protected]>
kfswain added a commit to kfswain/llm-instance-gateway that referenced this pull request Apr 29, 2025
* Add unit test coverage for pod APIs under datastore/pkg

* Add unit test coverage for pod APIs under datastore/pkg

* Add unit test coverage for pod APIs under datastore/pkg

* Add unit test coverage for pod APIs under datastore/pkg

* EPP Architecture proposal (kubernetes-sigs#683)

* initial changes

* Adding to proposal to give a quick barebones definition to refactor

* feedback changes

* more feedback addressing

* removed unused Fake struct (kubernetes-sigs#723)

Signed-off-by: Nir Rozenbaum <[email protected]>

* epp: return correct response for trailers (kubernetes-sigs#726)

This looks like a copy paste error.

* Refactor scheduler to run plugins (kubernetes-sigs#677)

* Refactor scheduler to run plugins

* Add scheduler plugin latency metric

* Address comments

* Address comments

* Complete the InferencePool documentation (kubernetes-sigs#673)

* Initial guide for inference pool

* Add extensionReference to the InferencePool spec

* Fix list formatting

* Remove unused labels

* Autogenerate the spec

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Rename llm-pool names in rollout example

* Add use cases for replacing an inference pool

* Rewording the background section

* Create replacing-inference-pool.md

* Replace instructions with a link for how to replace an inference pool

* Update replacing-inference-pool.md

* Update mkdocs.yml

* Update replacing-inference-pool.md

* Update inferencemodel_types.go

* Update inferencepool.md

* Update site-src/guides/replacing-inference-pool.md

Co-authored-by: Rob Scott <[email protected]>

---------

Co-authored-by: Rob Scott <[email protected]>

* reduce log level in metrics logger not to trash the log (kubernetes-sigs#708)

* reduce log level in metrics logger not to trash the log

Signed-off-by: Nir Rozenbaum <[email protected]>

* rename flush metrics to refresh metrics

Signed-off-by: Nir Rozenbaum <[email protected]>

* revert log level

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>

* few updates in datastore (kubernetes-sigs#713)

* few updates in datastore

Signed-off-by: Nir Rozenbaum <[email protected]>

* PoolSet documentation

Signed-off-by: Nir Rozenbaum <[email protected]>

* error phrasing

Signed-off-by: Nir Rozenbaum <[email protected]>

* removed unused pool arg from PodUpdateOrAddIfNotExist

Signed-off-by: Nir Rozenbaum <[email protected]>

* linter

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>

* scheduler refactoring (kubernetes-sigs#730)

Signed-off-by: Nir Rozenbaum <[email protected]>

* filter irrelevant pod in pod_reconciler (kubernetes-sigs#696)

* EPP: Update GetRandomPod() to return nil if no pods exist (kubernetes-sigs#731)

Signed-off-by: Daneyon Hansen <[email protected]>

* Move filter and scorer plugins registration to a separate file (kubernetes-sigs#729)

* Move filters and scorers registration to filter/scorer specific files

* Default scheduler config contains empty list of scorers

Signed-off-by: Maya Barnea <[email protected]>

* Default plugin is not a scorer any more

Signed-off-by: Maya Barnea <[email protected]>

* fix scheduler test + lint comments

Signed-off-by: Maya Barnea <[email protected]>

---------

Signed-off-by: Maya Barnea <[email protected]>

* Update issue templates (kubernetes-sigs#738)

* Update issue templates

* Updates artifacts for v0.3.0-rc.1 release

Signed-off-by: Kellen Swain <[email protected]>

* Updates bbr chart for v0.3.0-rc.1 release

Signed-off-by: Kellen Swain <[email protected]>

* Updates artifacts for v0.3.0 release

Signed-off-by: Kellen Swain <[email protected]>

* Adding blank issue template so that all issues start with  label

---------

Signed-off-by: Kellen Swain <[email protected]>

* Add unit test coverage for pod APIs under datastore/pkg

* few updates in datastore (kubernetes-sigs#713)

* few updates in datastore

Signed-off-by: Nir Rozenbaum <[email protected]>

* PoolSet documentation

Signed-off-by: Nir Rozenbaum <[email protected]>

* error phrasing

Signed-off-by: Nir Rozenbaum <[email protected]>

* removed unused pool arg from PodUpdateOrAddIfNotExist

Signed-off-by: Nir Rozenbaum <[email protected]>

* linter

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>

* few updates in datastore (kubernetes-sigs#713)

* few updates in datastore

Signed-off-by: Nir Rozenbaum <[email protected]>

* PoolSet documentation

Signed-off-by: Nir Rozenbaum <[email protected]>

* error phrasing

Signed-off-by: Nir Rozenbaum <[email protected]>

* removed unused pool arg from PodUpdateOrAddIfNotExist

Signed-off-by: Nir Rozenbaum <[email protected]>

* linter

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>

* Add unit test coverage for pod APIs under datastore/pkg

---------

Signed-off-by: Nir Rozenbaum <[email protected]>
Signed-off-by: Daneyon Hansen <[email protected]>
Signed-off-by: Maya Barnea <[email protected]>
Signed-off-by: Kellen Swain <[email protected]>
Co-authored-by: Kellen Swain <[email protected]>
Co-authored-by: Nir Rozenbaum <[email protected]>
Co-authored-by: John Howard <[email protected]>
Co-authored-by: Cong Liu <[email protected]>
Co-authored-by: Nicole Xin <[email protected]>
Co-authored-by: Rob Scott <[email protected]>
Co-authored-by: nayihz <[email protected]>
Co-authored-by: Daneyon Hansen <[email protected]>
Co-authored-by: Maya Barnea <[email protected]>
kfswain pushed a commit to kfswain/llm-instance-gateway that referenced this pull request Apr 29, 2025
* Initial guide for inference pool

* Add extensionReference to the InferencePool spec

* Fix list formatting

* Remove unused labels

* Autogenerate the spec

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Rename llm-pool names in rollout example

* Add use cases for replacing an inference pool

* Rewording the background section

* Create replacing-inference-pool.md

* Replace instructions with a link for how to replace an inference pool

* Update replacing-inference-pool.md

* Update mkdocs.yml

* Update replacing-inference-pool.md

* Update inferencemodel_types.go

* Update inferencepool.md

* Update site-src/guides/replacing-inference-pool.md

Co-authored-by: Rob Scott <[email protected]>

---------

Co-authored-by: Rob Scott <[email protected]>
kfswain added a commit to kfswain/llm-instance-gateway that referenced this pull request Apr 29, 2025
* Add unit test coverage for pod APIs under datastore/pkg

* Add unit test coverage for pod APIs under datastore/pkg

* Add unit test coverage for pod APIs under datastore/pkg

* Add unit test coverage for pod APIs under datastore/pkg

* EPP Architecture proposal (kubernetes-sigs#683)

* initial changes

* Adding to proposal to give a quick barebones definition to refactor

* feedback changes

* more feedback addressing

* removed unused Fake struct (kubernetes-sigs#723)

Signed-off-by: Nir Rozenbaum <[email protected]>

* epp: return correct response for trailers (kubernetes-sigs#726)

This looks like a copy paste error.

* Refactor scheduler to run plugins (kubernetes-sigs#677)

* Refactor scheduler to run plugins

* Add scheduler plugin latency metric

* Address comments

* Address comments

* Complete the InferencePool documentation (kubernetes-sigs#673)

* Initial guide for inference pool

* Add extensionReference to the InferencePool spec

* Fix list formatting

* Remove unused labels

* Autogenerate the spec

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Rename llm-pool names in rollout example

* Add use cases for replacing an inference pool

* Rewording the background section

* Create replacing-inference-pool.md

* Replace instructions with a link for how to replace an inference pool

* Update replacing-inference-pool.md

* Update mkdocs.yml

* Update replacing-inference-pool.md

* Update inferencemodel_types.go

* Update inferencepool.md

* Update site-src/guides/replacing-inference-pool.md

Co-authored-by: Rob Scott <[email protected]>

---------

Co-authored-by: Rob Scott <[email protected]>

* reduce log level in metrics logger not to trash the log (kubernetes-sigs#708)

* reduce log level in metrics logger not to trash the log

Signed-off-by: Nir Rozenbaum <[email protected]>

* rename flush metrics to refresh metrics

Signed-off-by: Nir Rozenbaum <[email protected]>

* revert log level

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>

* few updates in datastore (kubernetes-sigs#713)

* few updates in datastore

Signed-off-by: Nir Rozenbaum <[email protected]>

* PoolSet documentation

Signed-off-by: Nir Rozenbaum <[email protected]>

* error phrasing

Signed-off-by: Nir Rozenbaum <[email protected]>

* removed unused pool arg from PodUpdateOrAddIfNotExist

Signed-off-by: Nir Rozenbaum <[email protected]>

* linter

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>

* scheduler refactoring (kubernetes-sigs#730)

Signed-off-by: Nir Rozenbaum <[email protected]>

* filter irrelevant pod in pod_reconciler (kubernetes-sigs#696)

* EPP: Update GetRandomPod() to return nil if no pods exist (kubernetes-sigs#731)

Signed-off-by: Daneyon Hansen <[email protected]>

* Move filter and scorer plugins registration to a separate file (kubernetes-sigs#729)

* Move filters and scorers registration to filter/scorer specific files

* Default scheduler config contains empty list of scorers

Signed-off-by: Maya Barnea <[email protected]>

* Default plugin is not a scorer any more

Signed-off-by: Maya Barnea <[email protected]>

* fix scheduler test + lint comments

Signed-off-by: Maya Barnea <[email protected]>

---------

Signed-off-by: Maya Barnea <[email protected]>

* Update issue templates (kubernetes-sigs#738)

* Update issue templates

* Updates artifacts for v0.3.0-rc.1 release

Signed-off-by: Kellen Swain <[email protected]>

* Updates bbr chart for v0.3.0-rc.1 release

Signed-off-by: Kellen Swain <[email protected]>

* Updates artifacts for v0.3.0 release

Signed-off-by: Kellen Swain <[email protected]>

* Adding blank issue template so that all issues start with  label

---------

Signed-off-by: Kellen Swain <[email protected]>

* Add unit test coverage for pod APIs under datastore/pkg

* few updates in datastore (kubernetes-sigs#713)

* few updates in datastore

Signed-off-by: Nir Rozenbaum <[email protected]>

* PoolSet documentation

Signed-off-by: Nir Rozenbaum <[email protected]>

* error phrasing

Signed-off-by: Nir Rozenbaum <[email protected]>

* removed unused pool arg from PodUpdateOrAddIfNotExist

Signed-off-by: Nir Rozenbaum <[email protected]>

* linter

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>

* few updates in datastore (kubernetes-sigs#713)

* few updates in datastore

Signed-off-by: Nir Rozenbaum <[email protected]>

* PoolSet documentation

Signed-off-by: Nir Rozenbaum <[email protected]>

* error phrasing

Signed-off-by: Nir Rozenbaum <[email protected]>

* removed unused pool arg from PodUpdateOrAddIfNotExist

Signed-off-by: Nir Rozenbaum <[email protected]>

* linter

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>

* Add unit test coverage for pod APIs under datastore/pkg

---------

Signed-off-by: Nir Rozenbaum <[email protected]>
Signed-off-by: Daneyon Hansen <[email protected]>
Signed-off-by: Maya Barnea <[email protected]>
Signed-off-by: Kellen Swain <[email protected]>
Co-authored-by: Kellen Swain <[email protected]>
Co-authored-by: Nir Rozenbaum <[email protected]>
Co-authored-by: John Howard <[email protected]>
Co-authored-by: Cong Liu <[email protected]>
Co-authored-by: Nicole Xin <[email protected]>
Co-authored-by: Rob Scott <[email protected]>
Co-authored-by: nayihz <[email protected]>
Co-authored-by: Daneyon Hansen <[email protected]>
Co-authored-by: Maya Barnea <[email protected]>
kfswain pushed a commit to kfswain/llm-instance-gateway that referenced this pull request Apr 29, 2025
* Initial guide for inference pool

* Add extensionReference to the InferencePool spec

* Fix list formatting

* Remove unused labels

* Autogenerate the spec

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Rename llm-pool names in rollout example

* Add use cases for replacing an inference pool

* Rewording the background section

* Create replacing-inference-pool.md

* Replace instructions with a link for how to replace an inference pool

* Update replacing-inference-pool.md

* Update mkdocs.yml

* Update replacing-inference-pool.md

* Update inferencemodel_types.go

* Update inferencepool.md

* Update site-src/guides/replacing-inference-pool.md

Co-authored-by: Rob Scott <[email protected]>

---------

Co-authored-by: Rob Scott <[email protected]>
kfswain added a commit to kfswain/llm-instance-gateway that referenced this pull request Apr 29, 2025
* Add unit test coverage for pod APIs under datastore/pkg

* Add unit test coverage for pod APIs under datastore/pkg

* Add unit test coverage for pod APIs under datastore/pkg

* Add unit test coverage for pod APIs under datastore/pkg

* EPP Architecture proposal (kubernetes-sigs#683)

* initial changes

* Adding to proposal to give a quick barebones definition to refactor

* feedback changes

* more feedback addressing

* removed unused Fake struct (kubernetes-sigs#723)

Signed-off-by: Nir Rozenbaum <[email protected]>

* epp: return correct response for trailers (kubernetes-sigs#726)

This looks like a copy paste error.

* Refactor scheduler to run plugins (kubernetes-sigs#677)

* Refactor scheduler to run plugins

* Add scheduler plugin latency metric

* Address comments

* Address comments

* Complete the InferencePool documentation (kubernetes-sigs#673)

* Initial guide for inference pool

* Add extensionReference to the InferencePool spec

* Fix list formatting

* Remove unused labels

* Autogenerate the spec

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Rename llm-pool names in rollout example

* Add use cases for replacing an inference pool

* Rewording the background section

* Create replacing-inference-pool.md

* Replace instructions with a link for how to replace an inference pool

* Update replacing-inference-pool.md

* Update mkdocs.yml

* Update replacing-inference-pool.md

* Update inferencemodel_types.go

* Update inferencepool.md

* Update site-src/guides/replacing-inference-pool.md

Co-authored-by: Rob Scott <[email protected]>

---------

Co-authored-by: Rob Scott <[email protected]>

* reduce log level in metrics logger not to trash the log (kubernetes-sigs#708)

* reduce log level in metrics logger not to trash the log

Signed-off-by: Nir Rozenbaum <[email protected]>

* rename flush metrics to refresh metrics

Signed-off-by: Nir Rozenbaum <[email protected]>

* revert log level

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>

* few updates in datastore (kubernetes-sigs#713)

* few updates in datastore

Signed-off-by: Nir Rozenbaum <[email protected]>

* PoolSet documentation

Signed-off-by: Nir Rozenbaum <[email protected]>

* error phrasing

Signed-off-by: Nir Rozenbaum <[email protected]>

* removed unused pool arg from PodUpdateOrAddIfNotExist

Signed-off-by: Nir Rozenbaum <[email protected]>

* linter

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>

* scheduler refactoring (kubernetes-sigs#730)

Signed-off-by: Nir Rozenbaum <[email protected]>

* filter irrelevant pod in pod_reconciler (kubernetes-sigs#696)

* EPP: Update GetRandomPod() to return nil if no pods exist (kubernetes-sigs#731)

Signed-off-by: Daneyon Hansen <[email protected]>

* Move filter and scorer plugins registration to a separate file (kubernetes-sigs#729)

* Move filters and scorers registration to filter/scorer specific files

* Default scheduler config contains empty list of scorers

Signed-off-by: Maya Barnea <[email protected]>

* Default plugin is not a scorer any more

Signed-off-by: Maya Barnea <[email protected]>

* fix scheduler test + lint comments

Signed-off-by: Maya Barnea <[email protected]>

---------

Signed-off-by: Maya Barnea <[email protected]>

* Update issue templates (kubernetes-sigs#738)

* Update issue templates

* Updates artifacts for v0.3.0-rc.1 release

Signed-off-by: Kellen Swain <[email protected]>

* Updates bbr chart for v0.3.0-rc.1 release

Signed-off-by: Kellen Swain <[email protected]>

* Updates artifacts for v0.3.0 release

Signed-off-by: Kellen Swain <[email protected]>

* Adding blank issue template so that all issues start with  label

---------

Signed-off-by: Kellen Swain <[email protected]>

* Add unit test coverage for pod APIs under datastore/pkg

* few updates in datastore (kubernetes-sigs#713)

* few updates in datastore

Signed-off-by: Nir Rozenbaum <[email protected]>

* PoolSet documentation

Signed-off-by: Nir Rozenbaum <[email protected]>

* error phrasing

Signed-off-by: Nir Rozenbaum <[email protected]>

* removed unused pool arg from PodUpdateOrAddIfNotExist

Signed-off-by: Nir Rozenbaum <[email protected]>

* linter

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>

* few updates in datastore (kubernetes-sigs#713)

* few updates in datastore

Signed-off-by: Nir Rozenbaum <[email protected]>

* PoolSet documentation

Signed-off-by: Nir Rozenbaum <[email protected]>

* error phrasing

Signed-off-by: Nir Rozenbaum <[email protected]>

* removed unused pool arg from PodUpdateOrAddIfNotExist

Signed-off-by: Nir Rozenbaum <[email protected]>

* linter

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>

* Add unit test coverage for pod APIs under datastore/pkg

---------

Signed-off-by: Nir Rozenbaum <[email protected]>
Signed-off-by: Daneyon Hansen <[email protected]>
Signed-off-by: Maya Barnea <[email protected]>
Signed-off-by: Kellen Swain <[email protected]>
Co-authored-by: Kellen Swain <[email protected]>
Co-authored-by: Nir Rozenbaum <[email protected]>
Co-authored-by: John Howard <[email protected]>
Co-authored-by: Cong Liu <[email protected]>
Co-authored-by: Nicole Xin <[email protected]>
Co-authored-by: Rob Scott <[email protected]>
Co-authored-by: nayihz <[email protected]>
Co-authored-by: Daneyon Hansen <[email protected]>
Co-authored-by: Maya Barnea <[email protected]>
JeffLuoo pushed a commit to JeffLuoo/gateway-api-inference-extension that referenced this pull request May 2, 2025
* Initial guide for inference pool

* Add extensionReference to the InferencePool spec

* Fix list formatting

* Remove unused labels

* Autogenerate the spec

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Rename llm-pool names in rollout example

* Add use cases for replacing an inference pool

* Rewording the background section

* Create replacing-inference-pool.md

* Replace instructions with a link for how to replace an inference pool

* Update replacing-inference-pool.md

* Update mkdocs.yml

* Update replacing-inference-pool.md

* Update inferencemodel_types.go

* Update inferencepool.md

* Update site-src/guides/replacing-inference-pool.md

Co-authored-by: Rob Scott <[email protected]>

---------

Co-authored-by: Rob Scott <[email protected]>
JeffLuoo pushed a commit to JeffLuoo/gateway-api-inference-extension that referenced this pull request May 2, 2025
* Add unit test coverage for pod APIs under datastore/pkg

* Add unit test coverage for pod APIs under datastore/pkg

* Add unit test coverage for pod APIs under datastore/pkg

* Add unit test coverage for pod APIs under datastore/pkg

* EPP Architecture proposal (kubernetes-sigs#683)

* initial changes

* Adding to proposal to give a quick barebones definition to refactor

* feedback changes

* more feedback addressing

* removed unused Fake struct (kubernetes-sigs#723)

Signed-off-by: Nir Rozenbaum <[email protected]>

* epp: return correct response for trailers (kubernetes-sigs#726)

This looks like a copy paste error.

* Refactor scheduler to run plugins (kubernetes-sigs#677)

* Refactor scheduler to run plugins

* Add scheduler plugin latency metric

* Address comments

* Address comments

* Complete the InferencePool documentation (kubernetes-sigs#673)

* Initial guide for inference pool

* Add extensionReference to the InferencePool spec

* Fix list formatting

* Remove unused labels

* Autogenerate the spec

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Update site-src/api-types/inferencepool.md

Co-authored-by: Rob Scott <[email protected]>

* Rename llm-pool names in rollout example

* Add use cases for replacing an inference pool

* Rewording the background section

* Create replacing-inference-pool.md

* Replace instructions with a link for how to replace an inference pool

* Update replacing-inference-pool.md

* Update mkdocs.yml

* Update replacing-inference-pool.md

* Update inferencemodel_types.go

* Update inferencepool.md

* Update site-src/guides/replacing-inference-pool.md

Co-authored-by: Rob Scott <[email protected]>

---------

Co-authored-by: Rob Scott <[email protected]>

* reduce log level in metrics logger not to trash the log (kubernetes-sigs#708)

* reduce log level in metrics logger not to trash the log

Signed-off-by: Nir Rozenbaum <[email protected]>

* rename flush metrics to refresh metrics

Signed-off-by: Nir Rozenbaum <[email protected]>

* revert log level

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>

* few updates in datastore (kubernetes-sigs#713)

* few updates in datastore

Signed-off-by: Nir Rozenbaum <[email protected]>

* PoolSet documentation

Signed-off-by: Nir Rozenbaum <[email protected]>

* error phrasing

Signed-off-by: Nir Rozenbaum <[email protected]>

* removed unused pool arg from PodUpdateOrAddIfNotExist

Signed-off-by: Nir Rozenbaum <[email protected]>

* linter

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>

* scheduler refactoring (kubernetes-sigs#730)

Signed-off-by: Nir Rozenbaum <[email protected]>

* filter irrelevant pod in pod_reconciler (kubernetes-sigs#696)

* EPP: Update GetRandomPod() to return nil if no pods exist (kubernetes-sigs#731)

Signed-off-by: Daneyon Hansen <[email protected]>

* Move filter and scorer plugins registration to a separate file (kubernetes-sigs#729)

* Move filters and scorers registration to filter/scorer specific files

* Default scheduler config contains empty list of scorers

Signed-off-by: Maya Barnea <[email protected]>

* Default plugin is not a scorer any more

Signed-off-by: Maya Barnea <[email protected]>

* fix scheduler test + lint comments

Signed-off-by: Maya Barnea <[email protected]>

---------

Signed-off-by: Maya Barnea <[email protected]>

* Update issue templates (kubernetes-sigs#738)

* Update issue templates

* Updates artifacts for v0.3.0-rc.1 release

Signed-off-by: Kellen Swain <[email protected]>

* Updates bbr chart for v0.3.0-rc.1 release

Signed-off-by: Kellen Swain <[email protected]>

* Updates artifacts for v0.3.0 release

Signed-off-by: Kellen Swain <[email protected]>

* Adding blank issue template so that all issues start with  label

---------

Signed-off-by: Kellen Swain <[email protected]>

* Add unit test coverage for pod APIs under datastore/pkg

* few updates in datastore (kubernetes-sigs#713)

* few updates in datastore

Signed-off-by: Nir Rozenbaum <[email protected]>

* PoolSet documentation

Signed-off-by: Nir Rozenbaum <[email protected]>

* error phrasing

Signed-off-by: Nir Rozenbaum <[email protected]>

* removed unused pool arg from PodUpdateOrAddIfNotExist

Signed-off-by: Nir Rozenbaum <[email protected]>

* linter

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>

* few updates in datastore (kubernetes-sigs#713)

* few updates in datastore

Signed-off-by: Nir Rozenbaum <[email protected]>

* PoolSet documentation

Signed-off-by: Nir Rozenbaum <[email protected]>

* error phrasing

Signed-off-by: Nir Rozenbaum <[email protected]>

* removed unused pool arg from PodUpdateOrAddIfNotExist

Signed-off-by: Nir Rozenbaum <[email protected]>

* linter

Signed-off-by: Nir Rozenbaum <[email protected]>

---------

Signed-off-by: Nir Rozenbaum <[email protected]>

* Add unit test coverage for pod APIs under datastore/pkg

---------

Signed-off-by: Nir Rozenbaum <[email protected]>
Signed-off-by: Daneyon Hansen <[email protected]>
Signed-off-by: Maya Barnea <[email protected]>
Signed-off-by: Kellen Swain <[email protected]>
Co-authored-by: Kellen Swain <[email protected]>
Co-authored-by: Nir Rozenbaum <[email protected]>
Co-authored-by: John Howard <[email protected]>
Co-authored-by: Cong Liu <[email protected]>
Co-authored-by: Nicole Xin <[email protected]>
Co-authored-by: Rob Scott <[email protected]>
Co-authored-by: nayihz <[email protected]>
Co-authored-by: Daneyon Hansen <[email protected]>
Co-authored-by: Maya Barnea <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants