From 3c581855ac2a75205a1baae24341b31d26f173b0 Mon Sep 17 00:00:00 2001 From: Jie WU Date: Wed, 19 Feb 2025 23:36:50 +0000 Subject: [PATCH 1/3] Move pkgepp/metrics/README.md -> site-src/guides/metrics.md --- {pkg/epp/metrics => site-src/guides/metrics.md}/README.md | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename {pkg/epp/metrics => site-src/guides/metrics.md}/README.md (100%) diff --git a/pkg/epp/metrics/README.md b/site-src/guides/metrics.md/README.md similarity index 100% rename from pkg/epp/metrics/README.md rename to site-src/guides/metrics.md/README.md From 74debb66e5f34f387df51ed9ccfb3c2c2829c0a4 Mon Sep 17 00:00:00 2001 From: Jie WU Date: Thu, 20 Feb 2025 14:09:21 +0000 Subject: [PATCH 2/3] add docs link for metrics.md --- mkdocs.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/mkdocs.yml b/mkdocs.yml index a024c16d2..8cd3f3fba 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -57,6 +57,7 @@ nav: - User Guides: - Getting started: guides/index.md - Adapter Rollout: guides/adapter-rollout.md + - Metrics: guides/metrics.md - Implementer's Guide: guides/implementers.md - Reference: - API Reference: reference/spec.md From 96ecbc0883d7a2883991fce0ffa53af3c64ef3c6 Mon Sep 17 00:00:00 2001 From: Jie WU Date: Thu, 20 Feb 2025 18:53:12 +0000 Subject: [PATCH 3/3] update formatting --- .../{metrics.md/README.md => metrics.md} | 30 ++++++++----------- 1 file changed, 13 insertions(+), 17 deletions(-) rename site-src/guides/{metrics.md/README.md => metrics.md} (51%) diff --git a/site-src/guides/metrics.md/README.md b/site-src/guides/metrics.md similarity index 51% rename from site-src/guides/metrics.md/README.md rename to site-src/guides/metrics.md index 1f68a0bdb..f793734d3 100644 --- a/site-src/guides/metrics.md/README.md +++ b/site-src/guides/metrics.md @@ -1,10 +1,6 @@ -# Documentation +# Metrics -This documentation is the current state of exposed metrics. - -## Table of Contents -* [Exposed Metrics](#exposed-metrics) -* [Scrape Metrics](#scrape-metrics) +This guide describes the current state of exposed metrics and how to scrape them. ## Requirements @@ -38,17 +34,17 @@ spec: ## Exposed metrics -| Metric name | Metric Type | Description | Labels | Status | -| ------------|--------------| ----------- | ------ | ------ | -| inference_model_request_total | Counter | The counter of requests broken out for each model. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | -| inference_model_request_error_total | Counter | The counter of requests errors broken out for each model. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | -| inference_model_request_duration_seconds | Distribution | Distribution of response latency. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | -| inference_model_request_sizes | Distribution | Distribution of request size in bytes. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | -| inference_model_response_sizes | Distribution | Distribution of response size in bytes. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | -| inference_model_input_tokens | Distribution | Distribution of input token count. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | -| inference_model_output_tokens | Distribution | Distribution of output token count. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | -| inference_pool_average_kv_cache_utilization | Gauge | The average kv cache utilization for an inference server pool. | `name`=<inference-pool-name> | ALPHA | -| inference_pool_average_queue_size | Gauge | The average number of requests pending in the model server queue. | `name`=<inference-pool-name> | ALPHA | +| **Metric name** | **Metric Type** |
**Description**
|
**Labels**
| **Status** | +|:---------------------------------------------|:-----------------|:------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:------------| +| inference_model_request_total | Counter | The counter of requests broken out for each model. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | +| inference_model_request_error_total | Counter | The counter of requests errors broken out for each model. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | +| inference_model_request_duration_seconds | Distribution | Distribution of response latency. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | +| inference_model_request_sizes | Distribution | Distribution of request size in bytes. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | +| inference_model_response_sizes | Distribution | Distribution of response size in bytes. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | +| inference_model_input_tokens | Distribution | Distribution of input token count. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | +| inference_model_output_tokens | Distribution | Distribution of output token count. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | +| inference_pool_average_kv_cache_utilization | Gauge | The average kv cache utilization for an inference server pool. | `name`=<inference-pool-name> | ALPHA | +| inference_pool_average_queue_size | Gauge | The average number of requests pending in the model server queue. | `name`=<inference-pool-name> | ALPHA | ## Scrape Metrics