Skip to content

Docs: Create EPP Operations Guide #735

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
danehans opened this issue Apr 24, 2025 · 3 comments
Open

Docs: Create EPP Operations Guide #735

danehans opened this issue Apr 24, 2025 · 3 comments
Labels
documentation Improvements or additions to documentation help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@danehans
Copy link
Contributor

Questions such as "How many InferencePools does an EPP support?" have been asked several times recently. A user guide should be added that provides steps for performing common EPP operations tasks.

@kfswain kfswain added the triage/accepted Indicates an issue or PR is ready to be actively worked on. label Apr 24, 2025
@kfswain
Copy link
Collaborator

kfswain commented Apr 24, 2025

This issue's context is currently scoped to those familiar with GIE, and I think it will need to stay that way given the subject matter.

However, if someone feels confident in picking this issue up, but requires more context, feel free to ping anyone here: https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/main/OWNERS_ALIASES

@danehans
Copy link
Contributor Author

danehans commented Apr 24, 2025

cc: @smarterclayton @kfswain @ahg-g @nirrozenbaum, please add comments for EPP operations tasks that should be documented. Here are a few that come to mind:

  1. Upgrades/Rollbacks
  2. Troubleshooting
  3. Scaling EPP or provide auto scaling details (when supported).
  4. Monitoring and observability- We already have an EPP metrics doc and Grafana dashboards but users still need to manually stitch things together and fill-in some gaps.
  5. Securing EPP- Currently EPP uses a self-signed cert (xref). Should cert manager be used for this doc?
  6. EPP Perf Optimization- How can a user optimize EPP for perf based on common use cases.

Should the guide cover inference platform and inference workload tasks?

@danehans danehans added documentation Improvements or additions to documentation help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Apr 24, 2025
@nirrozenbaum
Copy link
Contributor

nirrozenbaum commented Apr 25, 2025

other than operations tasks, I think this document should explain the mental model of EPP and the why.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants