diff --git a/config/manifests/benchmark/model-server-service.yaml b/config/manifests/benchmark/model-server-service.yaml deleted file mode 100644 index 014054cf8..000000000 --- a/config/manifests/benchmark/model-server-service.yaml +++ /dev/null @@ -1,12 +0,0 @@ -apiVersion: v1 -kind: Service -metadata: - name: my-pool-service -spec: - ports: - - port: 8081 - protocol: TCP - targetPort: 8000 - selector: - app: my-pool - type: LoadBalancer diff --git a/site-src/performance/benchmark/index.md b/site-src/performance/benchmark/index.md index e612c49d4..fca8cc7a8 100644 --- a/site-src/performance/benchmark/index.md +++ b/site-src/performance/benchmark/index.md @@ -5,30 +5,26 @@ inference extension, and a Kubernetes service as the load balancing strategy. Th benchmark uses the [Latency Profile Generator](https://github.com/AI-Hypercomputer/inference-benchmark) (LPG) tool to generate load and collect results. -## Prerequisites +## Run benchmarks manually -### Deploy the inference extension and sample model server +### Prerequisite: have an endpoint ready to server inference traffic -Follow this user guide https://gateway-api-inference-extension.sigs.k8s.io/guides/ to deploy the -sample vLLM application, and the inference extension. +To serve via a Gateway using the inference extension, follow this [user guide](https://gateway-api-inference-extension.sigs.k8s.io/guides/) +to deploy the sample vLLM application, and the inference extension. -### [Optional] Scale the sample vLLM deployment - -You will more likely to see the benefits of the inference extension when there are a decent number of replicas to make the optimal routing decision. +You will more likely to see the benefits of the inference extension when there are a decent number of replicas to make the optimal routing decision. So consider scaling the sample application with more replicas: ```bash kubectl scale --replicas=8 -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/vllm/gpu-deployment.yaml ``` -### Expose the model server via a k8s service - -As the baseline, let's also expose the vLLM deployment as a k8s service: +To serve via a Kubernetes LoadBalancer service as a baseline comparison, you can expose the sample application: ```bash kubectl expose -f https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/config/manifests/vllm/gpu-deployment.yaml --port=8081 --target-port=8000 --type=LoadBalancer ``` -## Run benchmark +### Run benchmark The LPG benchmark tool works by sending traffic to the specified target IP and port, and collect results. Follow the steps below to run a single benchmark. You can deploy multiple LPG instances if you want to run benchmarks in parallel against different targets. @@ -60,18 +56,24 @@ to specify what this benchmark is for. For instance, `inference-extension` or `k the script below will watch for that log line and then start downloading results. ```bash - benchmark_id='my-benchmark' ./tools/benchmark/download-benchmark-results.bash + benchmark_id='my-benchmark' ./tools/benchmark/scripts/download-benchmark-results.bash ``` 1. After the script finishes, you should see benchmark results under `./tools/benchmark/output/default-run/my-benchmark/results/json` folder. -### Tips +#### Tips -* You can specify `run_id="runX"` environment variable when running the `./download-benchmark-results.bash` script. +* You can specify `run_id="runX"` environment variable when running the `download-benchmark-results.bash` script. This is useful when you run benchmarks multiple times to get a more statistically meaningful results and group the results accordingly. * Update the `request_rates` that best suit your benchmark environment. -### Advanced Benchmark Configurations +## Run benchmarks automatically + +The [benchmark automation tool](https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/tools/benchmark) enables defining benchmarks via a config file and running the benchmarks +automatically. It's currently experimental. To try it, refer to its [user guide](https://github.com/kubernetes-sigs/gateway-api-inference-extension/raw/main/tools/benchmark). + + +## Advanced Benchmark Configurations Pls refer to the [LPG user guide](https://github.com/AI-Hypercomputer/inference-benchmark?tab=readme-ov-file#configuring-the-benchmark) for a detailed list of configuration knobs. diff --git a/tools/benchmark/.gitignore b/tools/benchmark/.gitignore new file mode 100644 index 000000000..9b1960e71 --- /dev/null +++ b/tools/benchmark/.gitignore @@ -0,0 +1 @@ +output/ \ No newline at end of file diff --git a/tools/benchmark/README.md b/tools/benchmark/README.md index ffd3ee7b6..cb88aa9c7 100644 --- a/tools/benchmark/README.md +++ b/tools/benchmark/README.md @@ -1 +1,194 @@ -This folder contains resources to run performance benchmarks. Pls follow the benchmark guide here https://gateway-api-inference-extension.sigs.k8s.io/performance/benchmark. \ No newline at end of file +This folder contains resources to run performance benchmarks. Pls follow the benchmark guide here https://gateway-api-inference-extension.sigs.k8s.io/performance/benchmark. + +## Features + +1. **Config driven benchmarks**. Use the `./proto/benchmark.proto` API to write benchmark configurations, without the need to craft complex yamls. +2. **Reproducibility**. The tool will snapshot all the manifests needed for the benchmark run and mark them immutable (unless the user explicitly overrides it). +3. **Benchmark inheritance**. Extend an existing benchmark configuration by overriding a subset of parameters, instead of re-writing everything from scratch. +4. **Benchmark orchestration**. The tool automatically deploys benchmark environment into a cluster, and waits to collects results, and then tears down the environment. The tool deploys the benchmark resources in new namespaces so each benchmark runs independently. +5. **Auto generated request rate**. The tool can automatically generate request rates for known models and accelerators to cover a wide range of model server load from low latency to fully saturated throughput. +6. **Visulization tools**. The results can be analyzed with a jupyter notebook. +7. **Model server metrics**. The tool uses the latency profile generator benchmark tool to scrape metrics from Google Cloud monitoring. It also provides a link to a Google Cloud monitoring dashboard for detailed analysis. + +### Future Improvements + +1. The benchmark config and results are stored in protobuf format. The results can be persisted in a database such as Google Cloud Spanner to allow complex query and dashboarding use cases. +2. Support running benchmarks in parallel with user configured parallelism. + +## Prerequisite + +1. [Install helm](https://helm.sh/docs/intro/quickstart/#install-helm) +2. Install InferenceModel and InferencePool [CRDs](https://gateway-api-inference-extension.sigs.k8s.io/guides/#install-the-inference-extension-crds) +3. [Enable Envoy patch policy](https://gateway-api-inference-extension.sigs.k8s.io/guides/#update-envoy-gateway-config-to-enable-patch-policy). +4. Install [RBACs](https://github.com/kubernetes-sigs/gateway-api-inference-extension/blob/12bcc9a85dad828b146758ad34a69053dca44fa9/config/manifests/inferencepool.yaml#L78) for EPP to read pods. +5. Create a secret in the default namespace containing the HuggingFace token. + + ```bash + kubectl create secret generic hf-token --from-literal=token=$HF_TOKEN # Your Hugging Face Token with access to Llama2 + ``` + +6. [Optional, GCP only] Create a `gmp-test-sa` service account with `monitoring.Viewer` role to read additional model server metrics from cloud monitoring. + + ```bash + gcloud iam service-accounts create gmp-test-sa \ + && + gcloud projects add-iam-policy-binding ${BENCHMARK_PROJECT} \ + --member=serviceAccount:gmp-test-sa@${BENCHMARK_PROJECT}.iam.gserviceaccount.com \ + --role=roles/monitoring.viewer + ``` + +## Get started + +Run all existing benchmarks: + +```bash +# Run all benchmarks in the ./catalog/benchmark folder +./scripts/run_all_benchmarks.bash +``` + +View the benchmark results: + +* To view raw results, watch for a new results folder to be created `./output/{run_id}/`. +* To visualize the results, use the jupyter notebook. + +## Common usage + +### Run all benchmarks in a particular benchmark config file and upload results to GCS + +```bash +gcs_bucket='my-bucket' benchmarks=benchmarks ./scripts/run_benchmarks_file.bash +``` + +### Generate benchmark manifests only + +```bash +# All available environment variables. +benchmarks=benchmarks ./scripts/generate_manifests.bash +``` + +### Run particular benchmarks in a benchmark config file, by matching a benchmark name refex + +```bash +# Run all benchmarks with Nvidia H100 +gcs_bucket='my-bucket' benchmarks=benchmarks benchmark_name_regex='.*h100.*' ./scripts/run_benchmarks_file.bash +``` + +### Resume a benchmark run from an existing run_id + +You may resume benchmarks from previously generated manifests. The tool will skip benchmarks which have the `results` folder, and continue those without results. + +```bash +run_id='existing-run-id' benchmarks=benchmarks ./scripts/run_benchmarks_file.bash +``` + +### Keep the benchmark environment after benchmark is complete (for debugging) + +```bash +# All available environment variables. +skip_tear_down='true' benchmarks=benchmarks ./scripts/run_benchmarks_file.bash +``` + +## Command references + +```bash +# All available environment variables +regex='my-benchmark-file-name-regex' dry_run='false' gcs_bucket='my-bucket' skip_tear_down='false' benchmark_name_regex='my-benchmark-name-regex' ./scripts/run_all_benchmarks.bash +``` + +```bash +# All available environment variables. +run_id='existing-run-id' dry_run='false' gcs_bucket='my-bucket' skip_tear_down='false' benchmarks=benchmarks benchmark_name_regex='my-benchmark-name-regex' ./scripts/run_benchmarks_file.bash +``` + +```bash +# All available environment variables. +run_id='existing-run-id' benchmarks=benchmarks ./scripts/generate_manifests.bash +``` + +## How does it work? + +The tool will automate the following steps: + +1. Read the benchmark config file in `./catalog/{benchmarks_config_file}`. The file contains a list of benchmarks. The config API is defined in `./proto/benchmark.proto`. +2. Generates a new run_id and namespace `{benchmark_name}-{run_id}` to run the benchmarks. If the `run_id` environment variable is provided, it will reuse it instead of creating a new one. This is useful when resuming a previous benchmark run, or run multiple sets of benchmarks in parallel (e.g., run benchmarks on different accelerator types in parallel using the same run_id). +3. Based on the config, generates manifests in `./output/{run_id}/{benchmark_name}-{run_id}/manifests` +4. Applies the manifests to the cluster, and wait for resources to be ready. +5. Once the benchmark finishes, downloads benchmark results to `./output/{run_id}/{benchmark}-{run_id}/results` +6. [Optional] If a GCS bucket is specified, uploads the output folder to a GCS bucket. + +## Create a new benchmark + +You can either add new benchmarks to an existing benchmark config file, or create new benchmark config files. Each benchmark config file contains a list of benchmarks. + +An example benchmark with all available parameters is as follows: + +``` +benchmarks { + name: "base-benchmark" + config { + model_server { + image: "vllm/vllm-openai@sha256:8672d9356d4f4474695fd69ef56531d9e482517da3b31feb9c975689332a4fb0" + accelerator: "nvidia-h100-80gb" + replicas: 1 + vllm { + tensor_parallelism: "1" + model: "meta-llama/Llama-2-7b-hf" + } + } + load_balancer { + gateway { + envoy { + epp { + image: "us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/epp:v0.1.0" + } + } + } + } + benchmark_tool { + image: "us-docker.pkg.dev/gke-inference-gateway-dev/benchmark/benchmark-tool@sha256:1fe4991ec1e9379b261a62631e1321b8ea15772a6d9a74357932771cea7b0500" + lpg { + dataset: "sharegpt_v3_unfiltered_cleaned_split" + models: "meta-llama/Llama-2-7b-hf" + ip: "to-be-populated-automatically" + port: "8081" + benchmark_time_seconds: "60" + output_length: "1024" + } + } + } +} +``` + +### Create a benchmark from a base benchmark + +It's recommended to create a benchmark from an existing benchmark by overriding a few parameters. This inheritance feature is powerful in creating a large number of benchmarks conveniently. Below is an example that overrides the replica count of a base benchmark: + +``` +benchmarks { + name: "new-benchmark" + base_benchmark_name: "base-benchmark" + config { + model_server { + replicas: 2 + } + } +} +``` + +## Environment configurations + +The tool has default configurations (such as the cluster name) in `./scripts/env.sh`. You can tweak those for your own needs. + +## The benchmark.proto + +The `./proto/benchmark.proto` is the core of this tool, it drives the generation of the benchmark manifests, as well as the query and dashboarding of the results. + +Why do we need it? + +* An API to clearly capture the intent, instead of making various assumptions. +* It lets the user to focus only on the core parameters of the benchmark itself, rather than the toil of configuring the environment and crafting the manifests. +* It is the single source of truth that drives the entre lifecycle of the benchmark, including post analysis. + +## Contribute + +Refer to the [dev guide](./dev.md). \ No newline at end of file diff --git a/tools/benchmark/catalog/base-model.pbtxt b/tools/benchmark/catalog/base-model.pbtxt new file mode 100644 index 000000000..5f655a8c8 --- /dev/null +++ b/tools/benchmark/catalog/base-model.pbtxt @@ -0,0 +1,66 @@ + +# proto file: proto/benchmark.proto +# proto message: Benchmarks + +benchmarks { + name: "r8-svc-vllmv1" + config { + model_server { + image: "vllm/vllm-openai:v0.8.1" + accelerator: "nvidia-h100-80gb" + replicas: 8 + vllm { + tensor_parallelism: "1" + model: "meta-llama/Llama-2-7b-hf" + v1: "1" + } + } + load_balancer { + k8s_service {} + } + benchmark_tool { + # The following image was built from this source https://github.com/AI-Hypercomputer/inference-benchmark/tree/07628c9fe01b748f5a4cc9e5c2ee4234aaf47699 + image: 'us-docker.pkg.dev/cloud-tpu-images/inference/inference-benchmark@sha256:1c100b0cc949c7df7a2db814ae349c790f034b4b373aaad145e77e815e838438' + lpg { + dataset: "sharegpt_v3_unfiltered_cleaned_split" + models: "meta-llama/Llama-2-7b-hf" + tokenizer: "meta-llama/Llama-2-7b-hf" + ip: "to-be-populated-automatically" + port: "8081" + benchmark_time_seconds: "100" + output_length: "2048" + + } + } + } +} + +benchmarks { + name: "r8-epp-vllmv1" + base_benchmark_name: "r8-svc-vllmv1" + config { + load_balancer { + gateway { + envoy { + epp { + image: "us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/epp:main" + refresh_metrics_interval: "50ms" + } + } + full_duplex_streaming_enabled: true + } + } + } +} + +benchmarks { + name: "r8-epp-no-streaming-vllmv1" + base_benchmark_name: "r8-epp-vllmv1" + config { + load_balancer { + gateway { + full_duplex_streaming_enabled: false + } + } + } +} \ No newline at end of file diff --git a/tools/benchmark/catalog/charts/BenchmarkTool/.helmignore b/tools/benchmark/catalog/charts/BenchmarkTool/.helmignore new file mode 100644 index 000000000..0e8a0eb36 --- /dev/null +++ b/tools/benchmark/catalog/charts/BenchmarkTool/.helmignore @@ -0,0 +1,23 @@ +# Patterns to ignore when building packages. +# This supports shell glob matching, relative path matching, and +# negation (prefixed with !). Only one pattern per line. +.DS_Store +# Common VCS dirs +.git/ +.gitignore +.bzr/ +.bzrignore +.hg/ +.hgignore +.svn/ +# Common backup files +*.swp +*.bak +*.tmp +*.orig +*~ +# Various IDEs +.project +.idea/ +*.tmproj +.vscode/ diff --git a/tools/benchmark/catalog/charts/BenchmarkTool/Chart.yaml b/tools/benchmark/catalog/charts/BenchmarkTool/Chart.yaml new file mode 100644 index 000000000..ca6126d40 --- /dev/null +++ b/tools/benchmark/catalog/charts/BenchmarkTool/Chart.yaml @@ -0,0 +1,24 @@ +apiVersion: v2 +name: BenchmarkTool +description: A Helm chart for Kubernetes + +# A chart can be either an 'application' or a 'library' chart. +# +# Application charts are a collection of templates that can be packaged into versioned archives +# to be deployed. +# +# Library charts provide useful utilities or functions for the chart developer. They're included as +# a dependency of application charts to inject those utilities and functions into the rendering +# pipeline. Library charts do not define any templates and therefore cannot be deployed. +type: application + +# This is the chart version. This version number should be incremented each time you make changes +# to the chart and its templates, including the app version. +# Versions are expected to follow Semantic Versioning (https://semver.org/) +version: 0.1.0 + +# This is the version number of the application being deployed. This version number should be +# incremented each time you make changes to the application. Versions are not expected to +# follow Semantic Versioning. They should reflect the version the application is using. +# It is recommended to use it with quotes. +appVersion: "1.16.0" diff --git a/tools/benchmark/catalog/charts/BenchmarkTool/templates/_helpers.tpl b/tools/benchmark/catalog/charts/BenchmarkTool/templates/_helpers.tpl new file mode 100644 index 000000000..2d391062f --- /dev/null +++ b/tools/benchmark/catalog/charts/BenchmarkTool/templates/_helpers.tpl @@ -0,0 +1,33 @@ +{{/* +Common labels +*/}} +{{- define "benchmark.labels" -}} +helm.sh/chart: {{ include "benchmark.chart" . }} +{{ include "benchmark.selectorLabels" . }} +{{- if .Chart.AppVersion }} +app.kubernetes.io/version: {{ .Chart.AppVersion | quote }} +{{- end }} +app.kubernetes.io/managed-by: {{ .Release.Service }} +{{- end }} + +{{/* +Selector labels +*/}} +{{- define "benchmark.selectorLabels" -}} +app.kubernetes.io/name: {{ include "benchmark.name" . }} +app.kubernetes.io/instance: {{ .Release.Name }} +{{- end }} + +{{/* +Create chart name and version as used by the chart label. +*/}} +{{- define "benchmark.chart" -}} +{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }} +{{- end }} + +{{/* +Create a name for the chart +*/}} +{{- define "benchmark.name" -}} +{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }} +{{- end }} diff --git a/tools/benchmark/catalog/charts/BenchmarkTool/templates/deployment.yaml b/tools/benchmark/catalog/charts/BenchmarkTool/templates/deployment.yaml new file mode 100644 index 000000000..0b867591a --- /dev/null +++ b/tools/benchmark/catalog/charts/BenchmarkTool/templates/deployment.yaml @@ -0,0 +1,70 @@ +# charts/BenchmarkTool/templates/deployment.yaml +apiVersion: apps/v1 +kind: Deployment +metadata: + name: benchmark-tool + namespace: {{ .Release.Namespace }} + labels: + app: benchmark-tool +spec: + replicas: 1 + selector: + matchLabels: + app: benchmark-tool + template: + metadata: + labels: + app: benchmark-tool + annotations: + gke-gcsfuse/volumes: "true" + spec: + containers: + - name: benchmark-tool + command: + - bash + - -c + - ./latency_throughput_curve.sh + env: + - name: IP + value: {{ .Values.global.config.benchmark_tool.lpg.ip | quote }} + - name: REQUEST_RATES + value: {{ .Values.global.config.benchmark_tool.lpg.request_rates | quote }} + - name: TOKENIZER + value: {{ .Values.global.config.benchmark_tool.lpg.tokenizer | quote }} + - name: MODELS + value: {{ .Values.global.config.benchmark_tool.lpg.models | quote }} + - name: BENCHMARK_TIME_SECONDS + value: {{ .Values.global.config.benchmark_tool.lpg.benchmark_time_seconds | quote }} + - name: BACKEND + value: vllm + - name: PORT + value: {{ .Values.global.config.benchmark_tool.lpg.port | quote }} + - name: INPUT_LENGTH + value: "1024" + - name: OUTPUT_LENGTH + value: {{ .Values.global.config.benchmark_tool.lpg.output_length | quote }} + - name: FILE_PREFIX + value: "benchmark-catalog" + - name: SCRAPE_SERVER_METRICS + value: "true" + - name: PM_NAMESPACE + value: {{ .Release.Namespace | quote }} + - name: PM_JOB + value: model-server-monitoring + - name: PROMPT_DATASET_FILE + value: ShareGPT_V3_unfiltered_cleaned_split.json + - name: HF_TOKEN + valueFrom: + secretKeyRef: + key: token + name: hf-token + image: {{ .Values.global.config.benchmark_tool.image }} + imagePullPolicy: Always + resources: + requests: + cpu: "6" + memory: "30Gi" + limits: + cpu: "6" + memory: "30Gi" + diff --git a/tools/benchmark/catalog/charts/LoadBalancer/.helmignore b/tools/benchmark/catalog/charts/LoadBalancer/.helmignore new file mode 100644 index 000000000..0e8a0eb36 --- /dev/null +++ b/tools/benchmark/catalog/charts/LoadBalancer/.helmignore @@ -0,0 +1,23 @@ +# Patterns to ignore when building packages. +# This supports shell glob matching, relative path matching, and +# negation (prefixed with !). Only one pattern per line. +.DS_Store +# Common VCS dirs +.git/ +.gitignore +.bzr/ +.bzrignore +.hg/ +.hgignore +.svn/ +# Common backup files +*.swp +*.bak +*.tmp +*.orig +*~ +# Various IDEs +.project +.idea/ +*.tmproj +.vscode/ diff --git a/tools/benchmark/catalog/charts/LoadBalancer/Chart.lock b/tools/benchmark/catalog/charts/LoadBalancer/Chart.lock new file mode 100644 index 000000000..aefda7c10 --- /dev/null +++ b/tools/benchmark/catalog/charts/LoadBalancer/Chart.lock @@ -0,0 +1,15 @@ +dependencies: +- name: Gateway + repository: file://./Gateway + version: 0.1.0 +- name: Envoy + repository: file://./Gateway/Envoy + version: 0.1.0 +- name: GKE + repository: file://./Gateway/GKE + version: 0.1.0 +- name: K8SService + repository: file://./K8SService + version: 0.1.0 +digest: sha256:ccf3694a26deddfb2ced44a80a275d81b880ceff2510dc1c2e3e4c03cacece0a +generated: "2025-03-20T20:14:32.562287-07:00" diff --git a/tools/benchmark/catalog/charts/LoadBalancer/Chart.yaml b/tools/benchmark/catalog/charts/LoadBalancer/Chart.yaml new file mode 100644 index 000000000..a8408d6e8 --- /dev/null +++ b/tools/benchmark/catalog/charts/LoadBalancer/Chart.yaml @@ -0,0 +1,43 @@ +apiVersion: v2 +name: LoadBalancer +description: A Helm chart for Kubernetes + +# A chart can be either an 'application' or a 'library' chart. +# +# Application charts are a collection of templates that can be packaged into versioned archives +# to be deployed. +# +# Library charts provide useful utilities or functions for the chart developer. They're included as +# a dependency of application charts to inject those utilities and functions into the rendering +# pipeline. Library charts do not define any templates and therefore cannot be deployed. +type: application + +# This is the chart version. This version number should be incremented each time you make changes +# to the chart and its templates, including the app version. +# Versions are expected to follow Semantic Versioning (https://semver.org/) +version: 0.1.0 + +# This is the version number of the application being deployed. This version number should be +# incremented each time you make changes to the application. Versions are not expected to +# follow Semantic Versioning. They should reflect the version the application is using. +# It is recommended to use it with quotes. +appVersion: "1.16.0" + +dependencies: + - name: Gateway + version: 0.1.0 + repository: "file://./Gateway" + condition: global.config.load_balancer.gateway_enabled + - name: Envoy + version: 0.1.0 + repository: "file://./Gateway/Envoy" + condition: global.config.load_balancer.gateway_envoy_enabled + - name: GKE + version: 0.1.0 + repository: "file://./Gateway/GKE" + condition: global.config.load_balancer.gateway_gke_gateway_enabled + - name: K8SService + version: 0.1.0 + repository: "file://./K8SService" + condition: global.config.load_balancer.k8s_service_enabled + diff --git a/tools/benchmark/catalog/charts/LoadBalancer/Gateway/.helmignore b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/.helmignore new file mode 100644 index 000000000..0e8a0eb36 --- /dev/null +++ b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/.helmignore @@ -0,0 +1,23 @@ +# Patterns to ignore when building packages. +# This supports shell glob matching, relative path matching, and +# negation (prefixed with !). Only one pattern per line. +.DS_Store +# Common VCS dirs +.git/ +.gitignore +.bzr/ +.bzrignore +.hg/ +.hgignore +.svn/ +# Common backup files +*.swp +*.bak +*.tmp +*.orig +*~ +# Various IDEs +.project +.idea/ +*.tmproj +.vscode/ diff --git a/tools/benchmark/catalog/charts/LoadBalancer/Gateway/Chart.yaml b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/Chart.yaml new file mode 100644 index 000000000..e7dd78ed4 --- /dev/null +++ b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/Chart.yaml @@ -0,0 +1,24 @@ +apiVersion: v2 +name: Gateway +description: A Helm chart for Kubernetes + +# A chart can be either an 'application' or a 'library' chart. +# +# Application charts are a collection of templates that can be packaged into versioned archives +# to be deployed. +# +# Library charts provide useful utilities or functions for the chart developer. They're included as +# a dependency of application charts to inject those utilities and functions into the rendering +# pipeline. Library charts do not define any templates and therefore cannot be deployed. +type: application + +# This is the chart version. This version number should be incremented each time you make changes +# to the chart and its templates, including the app version. +# Versions are expected to follow Semantic Versioning (https://semver.org/) +version: 0.1.0 + +# This is the version number of the application being deployed. This version number should be +# incremented each time you make changes to the application. Versions are not expected to +# follow Semantic Versioning. They should reflect the version the application is using. +# It is recommended to use it with quotes. +appVersion: "1.16.0" diff --git a/tools/benchmark/catalog/charts/LoadBalancer/Gateway/Envoy/.helmignore b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/Envoy/.helmignore new file mode 100644 index 000000000..0e8a0eb36 --- /dev/null +++ b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/Envoy/.helmignore @@ -0,0 +1,23 @@ +# Patterns to ignore when building packages. +# This supports shell glob matching, relative path matching, and +# negation (prefixed with !). Only one pattern per line. +.DS_Store +# Common VCS dirs +.git/ +.gitignore +.bzr/ +.bzrignore +.hg/ +.hgignore +.svn/ +# Common backup files +*.swp +*.bak +*.tmp +*.orig +*~ +# Various IDEs +.project +.idea/ +*.tmproj +.vscode/ diff --git a/tools/benchmark/catalog/charts/LoadBalancer/Gateway/Envoy/Chart.yaml b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/Envoy/Chart.yaml new file mode 100644 index 000000000..4a46d6435 --- /dev/null +++ b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/Envoy/Chart.yaml @@ -0,0 +1,24 @@ +apiVersion: v2 +name: Envoy +description: A Helm chart for Kubernetes + +# A chart can be either an 'application' or a 'library' chart. +# +# Application charts are a collection of templates that can be packaged into versioned archives +# to be deployed. +# +# Library charts provide useful utilities or functions for the chart developer. They're included as +# a dependency of application charts to inject those utilities and functions into the rendering +# pipeline. Library charts do not define any templates and therefore cannot be deployed. +type: application + +# This is the chart version. This version number should be incremented each time you make changes +# to the chart and its templates, including the app version. +# Versions are expected to follow Semantic Versioning (https://semver.org/) +version: 0.1.0 + +# This is the version number of the application being deployed. This version number should be +# incremented each time you make changes to the application. Versions are not expected to +# follow Semantic Versioning. They should reflect the version the application is using. +# It is recommended to use it with quotes. +appVersion: "1.16.0" diff --git a/tools/benchmark/catalog/charts/LoadBalancer/Gateway/Envoy/templates/_helpers.tpl b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/Envoy/templates/_helpers.tpl new file mode 100644 index 000000000..2d391062f --- /dev/null +++ b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/Envoy/templates/_helpers.tpl @@ -0,0 +1,33 @@ +{{/* +Common labels +*/}} +{{- define "benchmark.labels" -}} +helm.sh/chart: {{ include "benchmark.chart" . }} +{{ include "benchmark.selectorLabels" . }} +{{- if .Chart.AppVersion }} +app.kubernetes.io/version: {{ .Chart.AppVersion | quote }} +{{- end }} +app.kubernetes.io/managed-by: {{ .Release.Service }} +{{- end }} + +{{/* +Selector labels +*/}} +{{- define "benchmark.selectorLabels" -}} +app.kubernetes.io/name: {{ include "benchmark.name" . }} +app.kubernetes.io/instance: {{ .Release.Name }} +{{- end }} + +{{/* +Create chart name and version as used by the chart label. +*/}} +{{- define "benchmark.chart" -}} +{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }} +{{- end }} + +{{/* +Create a name for the chart +*/}} +{{- define "benchmark.name" -}} +{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }} +{{- end }} diff --git a/tools/benchmark/catalog/charts/LoadBalancer/Gateway/Envoy/templates/epp-extension.yaml b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/Envoy/templates/epp-extension.yaml new file mode 100644 index 000000000..13ac0fed1 --- /dev/null +++ b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/Envoy/templates/epp-extension.yaml @@ -0,0 +1,33 @@ +apiVersion: gateway.envoyproxy.io/v1alpha1 +kind: EnvoyExtensionPolicy +metadata: + name: ext-proc-policy + namespace: {{ .Release.Namespace }} +spec: + extProc: + - backendRefs: + - group: "" + kind: Service + name: epp-service + namespace: {{ .Release.Namespace }} + port: 9002 + processingMode: + request: + body: Buffered + # response: + # body: Buffered + # The timeouts are likely not needed here. We can experiment with removing/tuning them slowly. + # The connection limits are more important and will cause the opaque: ext_proc_gRPC_error_14 error in Envoy GW if not configured correctly. + messageTimeout: 1000s + backendSettings: + circuitBreaker: + maxConnections: 40000 + maxPendingRequests: 40000 + maxParallelRequests: 40000 + timeout: + tcp: + connectTimeout: 24h + targetRef: + group: gateway.networking.k8s.io + kind: HTTPRoute + name: llm-route \ No newline at end of file diff --git a/tools/benchmark/catalog/charts/LoadBalancer/Gateway/Envoy/templates/gateway.yaml b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/Envoy/templates/gateway.yaml new file mode 100644 index 000000000..20b71bb18 --- /dev/null +++ b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/Envoy/templates/gateway.yaml @@ -0,0 +1,55 @@ + +--- +apiVersion: gateway.networking.k8s.io/v1 +kind: Gateway +metadata: + name: inference-gateway + namespace: {{ .Release.Namespace }} +spec: + gatewayClassName: inference-gateway + listeners: + - name: http + protocol: HTTP + port: 8080 + - name: llm-gw + protocol: HTTP + port: 8081 +--- +apiVersion: gateway.networking.k8s.io/v1 +kind: GatewayClass +metadata: + name: inference-gateway +spec: + controllerName: gateway.envoyproxy.io/gatewayclass-controller +--- +apiVersion: gateway.envoyproxy.io/v1alpha1 +kind: Backend +metadata: + name: backend-dummy + namespace: {{ .Release.Namespace }} +spec: + endpoints: + - fqdn: + # Both these values are arbitrary and unused as the PatchPolicy redirects requests. + hostname: 'foo.bar.com' + port: 8080 +--- +apiVersion: gateway.networking.k8s.io/v1 +kind: HTTPRoute +metadata: + name: llm-route + namespace: {{ .Release.Namespace }} +spec: + parentRefs: + - name: inference-gateway + namespace: {{ .Release.Namespace }} + sectionName: llm-gw + rules: + - backendRefs: + - group: gateway.envoyproxy.io + kind: Backend + name: backend-dummy + namespace: {{ .Release.Namespace }} + timeouts: + request: "24h" + backendRequest: "24h" diff --git a/tools/benchmark/catalog/charts/LoadBalancer/Gateway/Envoy/templates/patch-policy.yaml b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/Envoy/templates/patch-policy.yaml new file mode 100644 index 000000000..df28ebb27 --- /dev/null +++ b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/Envoy/templates/patch-policy.yaml @@ -0,0 +1,92 @@ +apiVersion: gateway.envoyproxy.io/v1alpha1 +kind: EnvoyPatchPolicy +metadata: + name: custom-response-patch-policy + namespace: {{ .Release.Namespace }} +spec: + targetRef: + group: gateway.networking.k8s.io + kind: Gateway + name: inference-gateway + type: JSONPatch + jsonPatches: + # Necessary to create a cluster of the type: ORIGINAL_DST to allow for + # direct pod scheduling. Which is heavily utilized in our scheduling. + # Specifically the field `original_dst_lb_config` allows us to enable + # `use_http_header` and `http_header_name`. + # Source: https://www.envoyproxy.io/docs/envoy/latest/api-v3/config/cluster/v3/cluster.proto + - type: "type.googleapis.com/envoy.config.cluster.v3.Cluster" + name: original_destination_cluster + operation: + op: add + path: "" + value: + name: original_destination_cluster + type: ORIGINAL_DST + original_dst_lb_config: + use_http_header: true + http_header_name: "x-gateway-destination-endpoint" + connect_timeout: 1000s + lb_policy: CLUSTER_PROVIDED + dns_lookup_family: V4_ONLY + circuit_breakers: + thresholds: + - max_connections: 40000 + max_pending_requests: 40000 + max_requests: 40000 + + # This ensures that envoy accepts untrusted certificates. We tried to explicitly + # set TrustChainVerification to ACCEPT_UNSTRUSTED, but that actually didn't work + # and what worked is setting the common_tls_context to empty. + - type: "type.googleapis.com/envoy.config.cluster.v3.Cluster" + name: {{ printf "envoyextensionpolicy/%s/ext-proc-policy/extproc/0" $.Release.Namespace }} + operation: + op: add + path: "/transport_socket" + value: + name: "envoy.transport_sockets.tls" + typed_config: + "@type": "type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext" + common_tls_context: {} + - type: "type.googleapis.com/envoy.config.route.v3.RouteConfiguration" + name: {{ printf "%s/inference-gateway/llm-gw" $.Release.Namespace }} + operation: + op: replace + path: "/virtual_hosts/0/routes/0/route/cluster" + value: original_destination_cluster + # Uncomment the below to enable full duplex streaming + {{- with .Values.global.config.load_balancer.gateway }} + {{- if $.Values.global.config.load_balancer.gateway.full_duplex_streaming_enabled }} + - type: "type.googleapis.com/envoy.config.listener.v3.Listener" + name: {{ printf "%s/inference-gateway/llm-gw" $.Release.Namespace }} + operation: + op: add + path: "/default_filter_chain/filters/0/typed_config/http_filters/0/typed_config/processing_mode/request_body_mode" + value: FULL_DUPLEX_STREAMED + - type: "type.googleapis.com/envoy.config.listener.v3.Listener" + name: {{ printf "%s/inference-gateway/llm-gw" $.Release.Namespace }} + operation: + op: add + path: "/default_filter_chain/filters/0/typed_config/http_filters/0/typed_config/processing_mode/request_trailer_mode" + value: SEND + - type: "type.googleapis.com/envoy.config.listener.v3.Listener" + name: {{ printf "%s/inference-gateway/llm-gw" $.Release.Namespace }} + operation: + op: add + path: "/default_filter_chain/filters/0/typed_config/http_filters/0/typed_config/processing_mode/response_body_mode" + value: FULL_DUPLEX_STREAMED + - type: "type.googleapis.com/envoy.config.listener.v3.Listener" + name: {{ printf "%s/inference-gateway/llm-gw" $.Release.Namespace }} + operation: + op: replace + path: "/default_filter_chain/filters/0/typed_config/http_filters/0/typed_config/processing_mode/response_trailer_mode" + value: SEND + - type: "type.googleapis.com/envoy.config.listener.v3.Listener" + name: {{ printf "%s/inference-gateway/llm-gw" $.Release.Namespace }} + operation: + op: replace + path: "/default_filter_chain/filters/0/typed_config/http_filters/0/typed_config/processing_mode/response_header_mode" + value: SEND + {{- end }} + {{- end }} + \ No newline at end of file diff --git a/tools/benchmark/catalog/charts/LoadBalancer/Gateway/Envoy/templates/traffic-policy.yaml b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/Envoy/templates/traffic-policy.yaml new file mode 100644 index 000000000..e9896a47d --- /dev/null +++ b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/Envoy/templates/traffic-policy.yaml @@ -0,0 +1,17 @@ +apiVersion: gateway.envoyproxy.io/v1alpha1 +kind: BackendTrafficPolicy +metadata: + name: high-connection-route-policy + namespace: {{ .Release.Namespace }} +spec: + targetRefs: + - group: gateway.networking.k8s.io + kind: HTTPRoute + name: llm-route + circuitBreaker: + maxConnections: 40000 + maxPendingRequests: 40000 + maxParallelRequests: 40000 + timeout: + tcp: + connectTimeout: 24h \ No newline at end of file diff --git a/tools/benchmark/catalog/charts/LoadBalancer/Gateway/GKE/.helmignore b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/GKE/.helmignore new file mode 100644 index 000000000..0e8a0eb36 --- /dev/null +++ b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/GKE/.helmignore @@ -0,0 +1,23 @@ +# Patterns to ignore when building packages. +# This supports shell glob matching, relative path matching, and +# negation (prefixed with !). Only one pattern per line. +.DS_Store +# Common VCS dirs +.git/ +.gitignore +.bzr/ +.bzrignore +.hg/ +.hgignore +.svn/ +# Common backup files +*.swp +*.bak +*.tmp +*.orig +*~ +# Various IDEs +.project +.idea/ +*.tmproj +.vscode/ diff --git a/tools/benchmark/catalog/charts/LoadBalancer/Gateway/GKE/Chart.yaml b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/GKE/Chart.yaml new file mode 100644 index 000000000..fb2b2132f --- /dev/null +++ b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/GKE/Chart.yaml @@ -0,0 +1,24 @@ +apiVersion: v2 +name: GKE +description: A Helm chart for Kubernetes + +# A chart can be either an 'application' or a 'library' chart. +# +# Application charts are a collection of templates that can be packaged into versioned archives +# to be deployed. +# +# Library charts provide useful utilities or functions for the chart developer. They're included as +# a dependency of application charts to inject those utilities and functions into the rendering +# pipeline. Library charts do not define any templates and therefore cannot be deployed. +type: application + +# This is the chart version. This version number should be incremented each time you make changes +# to the chart and its templates, including the app version. +# Versions are expected to follow Semantic Versioning (https://semver.org/) +version: 0.1.0 + +# This is the version number of the application being deployed. This version number should be +# incremented each time you make changes to the application. Versions are not expected to +# follow Semantic Versioning. They should reflect the version the application is using. +# It is recommended to use it with quotes. +appVersion: "1.16.0" diff --git a/tools/benchmark/catalog/charts/LoadBalancer/Gateway/GKE/templates/_helpers.tpl b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/GKE/templates/_helpers.tpl new file mode 100644 index 000000000..2d391062f --- /dev/null +++ b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/GKE/templates/_helpers.tpl @@ -0,0 +1,33 @@ +{{/* +Common labels +*/}} +{{- define "benchmark.labels" -}} +helm.sh/chart: {{ include "benchmark.chart" . }} +{{ include "benchmark.selectorLabels" . }} +{{- if .Chart.AppVersion }} +app.kubernetes.io/version: {{ .Chart.AppVersion | quote }} +{{- end }} +app.kubernetes.io/managed-by: {{ .Release.Service }} +{{- end }} + +{{/* +Selector labels +*/}} +{{- define "benchmark.selectorLabels" -}} +app.kubernetes.io/name: {{ include "benchmark.name" . }} +app.kubernetes.io/instance: {{ .Release.Name }} +{{- end }} + +{{/* +Create chart name and version as used by the chart label. +*/}} +{{- define "benchmark.chart" -}} +{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }} +{{- end }} + +{{/* +Create a name for the chart +*/}} +{{- define "benchmark.name" -}} +{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }} +{{- end }} diff --git a/tools/benchmark/catalog/charts/LoadBalancer/Gateway/templates/_helpers.tpl b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/templates/_helpers.tpl new file mode 100644 index 000000000..2d391062f --- /dev/null +++ b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/templates/_helpers.tpl @@ -0,0 +1,33 @@ +{{/* +Common labels +*/}} +{{- define "benchmark.labels" -}} +helm.sh/chart: {{ include "benchmark.chart" . }} +{{ include "benchmark.selectorLabels" . }} +{{- if .Chart.AppVersion }} +app.kubernetes.io/version: {{ .Chart.AppVersion | quote }} +{{- end }} +app.kubernetes.io/managed-by: {{ .Release.Service }} +{{- end }} + +{{/* +Selector labels +*/}} +{{- define "benchmark.selectorLabels" -}} +app.kubernetes.io/name: {{ include "benchmark.name" . }} +app.kubernetes.io/instance: {{ .Release.Name }} +{{- end }} + +{{/* +Create chart name and version as used by the chart label. +*/}} +{{- define "benchmark.chart" -}} +{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }} +{{- end }} + +{{/* +Create a name for the chart +*/}} +{{- define "benchmark.name" -}} +{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }} +{{- end }} diff --git a/tools/benchmark/catalog/charts/LoadBalancer/Gateway/templates/epp-deployment.yaml b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/templates/epp-deployment.yaml new file mode 100644 index 000000000..f347e017b --- /dev/null +++ b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/templates/epp-deployment.yaml @@ -0,0 +1,60 @@ + +--- +apiVersion: apps/v1 +kind: Deployment +metadata: + name: epp-deployment + namespace: {{ .Release.Namespace }} + labels: + app: epp-deployment +spec: + replicas: 1 + selector: + matchLabels: + app: epp-deployment + template: + metadata: + labels: + app: epp-deployment + spec: + containers: + - name: epp-deployment + image: {{ .Values.global.config.load_balancer.gateway.envoy.epp.image }} + imagePullPolicy: Always + args: + - -poolName + - "model-server-pool" + - -poolNamespace # This will be overriden if a different namespace is provided. + - {{ .Release.Namespace }} + - -refreshMetricsInterval + - {{ .Values.global.config.load_balancer.gateway.envoy.epp.refresh_metrics_interval }} + - -v + - "3" + env: + - name: USE_STREAMING + value: {{ .Values.global.config.load_balancer.gateway.full_duplex_streaming_enabled | quote }} + ports: + - containerPort: 9002 + - containerPort: 9003 + - name: metrics + containerPort: 9090 + livenessProbe: + grpc: + port: 9003 + service: inference-extension + initialDelaySeconds: 5 + periodSeconds: 10 + readinessProbe: + grpc: + port: 9003 + service: inference-extension + initialDelaySeconds: 5 + periodSeconds: 10 + resources: + requests: + cpu: "10" + memory: "20Gi" + limits: + cpu: "10" + memory: "20Gi" + diff --git a/tools/benchmark/catalog/charts/LoadBalancer/Gateway/templates/epp-monitoring.yaml b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/templates/epp-monitoring.yaml new file mode 100644 index 000000000..6bd83314a --- /dev/null +++ b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/templates/epp-monitoring.yaml @@ -0,0 +1,89 @@ +# apiVersion: monitoring.googleapis.com/v1 +# kind: PodMonitoring +# metadata: +# name: epp-monitoring +# labels: +# app.kubernetes.io/name: epp-monitoring +# app.kubernetes.io/part-of: google-cloud-managed-prometheus +# spec: +# endpoints: +# - port: 9090 +# scheme: http +# interval: 5s +# selector: +# matchExpressions: +# - key: app +# operator: In +# values: +# - epp-deployment + +--- +apiVersion: monitoring.googleapis.com/v1 +kind: ClusterPodMonitoring +metadata: + labels: + app.kubernetes.io/name: epp-monitoring + app.kubernetes.io/part-of: google-cloud-managed-prometheus + generated-by: benchmark-catalog-cli + name: epp-monitoring +spec: + endpoints: + - interval: 2s + port: 9090 + path: /metrics + authorization: + type: Bearer + credentials: + secret: + name: inference-gateway-sa-metrics-reader-secret + key: token + namespace: {{ .Release.Namespace }} + selector: + matchExpressions: + - key: app + operator: In + values: + - epp-deployment +--- +apiVersion: v1 +kind: ServiceAccount +metadata: + name: inference-gateway-sa-metrics-reader + namespace: {{ .Release.Namespace }} +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: inference-gateway-sa-metrics-reader-role-binding + namespace: {{ .Release.Namespace }} +subjects: +- kind: ServiceAccount + name: inference-gateway-sa-metrics-reader + namespace: {{ .Release.Namespace }} +roleRef: + kind: ClusterRole + name: inference-gateway-metrics-reader + apiGroup: rbac.authorization.k8s.io +--- +apiVersion: v1 +kind: Secret +metadata: + name: inference-gateway-sa-metrics-reader-secret + namespace: {{ .Release.Namespace }} + annotations: + kubernetes.io/service-account.name: inference-gateway-sa-metrics-reader +type: kubernetes.io/service-account-token +--- +apiVersion: rbac.authorization.k8s.io/v1 +kind: ClusterRoleBinding +metadata: + name: gmp-system:collector:inference-gateway-sa-metrics-reader-secret-read + namespace: {{ .Release.Namespace }} +roleRef: + name: inference-gateway-sa-metrics-reader-secret-read + kind: ClusterRole + apiGroup: rbac.authorization.k8s.io +subjects: +- name: collector + namespace: gmp-system + kind: ServiceAccount \ No newline at end of file diff --git a/tools/benchmark/catalog/charts/LoadBalancer/Gateway/templates/epp-service.yaml b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/templates/epp-service.yaml new file mode 100644 index 000000000..2df7d2508 --- /dev/null +++ b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/templates/epp-service.yaml @@ -0,0 +1,13 @@ +apiVersion: v1 +kind: Service +metadata: + name: epp-service + namespace: {{ .Release.Namespace }} +spec: + selector: + app: epp-deployment + ports: + - protocol: TCP + port: 9002 + targetPort: 9002 + type: ClusterIP diff --git a/tools/benchmark/catalog/charts/LoadBalancer/Gateway/templates/inference-models.yaml b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/templates/inference-models.yaml new file mode 100644 index 000000000..e7eb803d9 --- /dev/null +++ b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/templates/inference-models.yaml @@ -0,0 +1,38 @@ +--- +apiVersion: inference.networking.x-k8s.io/v1alpha1 +kind: InferenceModel +metadata: + name: gemma-2-27b + namespace: {{ .Release.Namespace }} +spec: + criticality: Critical + poolRef: + # this is the default val: + group: inference.networking.x-k8s.io + # this is the default val: + kind: InferencePool + name: model-server-pool + modelName: google/gemma-2-27b + targetModels: + - name: google/gemma-2-27b + weight: 100 + +--- +apiVersion: inference.networking.x-k8s.io/v1alpha1 +kind: InferenceModel +metadata: + name: llama-2-7b-hf + namespace: {{ .Release.Namespace }} +spec: + criticality: Critical + poolRef: + # this is the default val: + group: inference.networking.x-k8s.io + # this is the default val: + kind: InferencePool + name: model-server-pool + modelName: meta-llama/Llama-2-7b-hf + targetModels: + - name: meta-llama/Llama-2-7b-hf + weight: 100 + diff --git a/tools/benchmark/catalog/charts/LoadBalancer/Gateway/templates/inference-pool.yaml b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/templates/inference-pool.yaml new file mode 100644 index 000000000..e92ce592e --- /dev/null +++ b/tools/benchmark/catalog/charts/LoadBalancer/Gateway/templates/inference-pool.yaml @@ -0,0 +1,11 @@ +apiVersion: inference.networking.x-k8s.io/v1alpha1 +kind: InferencePool +metadata: + name: model-server-pool + namespace: {{ .Release.Namespace }} +spec: + targetPortNumber: 8000 + selector: + "app": "model-server-deployment" + extensionRef: + name: epp-deployment \ No newline at end of file diff --git a/tools/benchmark/catalog/charts/LoadBalancer/K8SService/.helmignore b/tools/benchmark/catalog/charts/LoadBalancer/K8SService/.helmignore new file mode 100644 index 000000000..0e8a0eb36 --- /dev/null +++ b/tools/benchmark/catalog/charts/LoadBalancer/K8SService/.helmignore @@ -0,0 +1,23 @@ +# Patterns to ignore when building packages. +# This supports shell glob matching, relative path matching, and +# negation (prefixed with !). Only one pattern per line. +.DS_Store +# Common VCS dirs +.git/ +.gitignore +.bzr/ +.bzrignore +.hg/ +.hgignore +.svn/ +# Common backup files +*.swp +*.bak +*.tmp +*.orig +*~ +# Various IDEs +.project +.idea/ +*.tmproj +.vscode/ diff --git a/tools/benchmark/catalog/charts/LoadBalancer/K8SService/Chart.yaml b/tools/benchmark/catalog/charts/LoadBalancer/K8SService/Chart.yaml new file mode 100644 index 000000000..7f112a670 --- /dev/null +++ b/tools/benchmark/catalog/charts/LoadBalancer/K8SService/Chart.yaml @@ -0,0 +1,24 @@ +apiVersion: v2 +name: K8SService +description: A Helm chart for Kubernetes + +# A chart can be either an 'application' or a 'library' chart. +# +# Application charts are a collection of templates that can be packaged into versioned archives +# to be deployed. +# +# Library charts provide useful utilities or functions for the chart developer. They're included as +# a dependency of application charts to inject those utilities and functions into the rendering +# pipeline. Library charts do not define any templates and therefore cannot be deployed. +type: application + +# This is the chart version. This version number should be incremented each time you make changes +# to the chart and its templates, including the app version. +# Versions are expected to follow Semantic Versioning (https://semver.org/) +version: 0.1.0 + +# This is the version number of the application being deployed. This version number should be +# incremented each time you make changes to the application. Versions are not expected to +# follow Semantic Versioning. They should reflect the version the application is using. +# It is recommended to use it with quotes. +appVersion: "1.16.0" diff --git a/tools/benchmark/catalog/charts/LoadBalancer/K8SService/templates/_helpers.tpl b/tools/benchmark/catalog/charts/LoadBalancer/K8SService/templates/_helpers.tpl new file mode 100644 index 000000000..2d391062f --- /dev/null +++ b/tools/benchmark/catalog/charts/LoadBalancer/K8SService/templates/_helpers.tpl @@ -0,0 +1,33 @@ +{{/* +Common labels +*/}} +{{- define "benchmark.labels" -}} +helm.sh/chart: {{ include "benchmark.chart" . }} +{{ include "benchmark.selectorLabels" . }} +{{- if .Chart.AppVersion }} +app.kubernetes.io/version: {{ .Chart.AppVersion | quote }} +{{- end }} +app.kubernetes.io/managed-by: {{ .Release.Service }} +{{- end }} + +{{/* +Selector labels +*/}} +{{- define "benchmark.selectorLabels" -}} +app.kubernetes.io/name: {{ include "benchmark.name" . }} +app.kubernetes.io/instance: {{ .Release.Name }} +{{- end }} + +{{/* +Create chart name and version as used by the chart label. +*/}} +{{- define "benchmark.chart" -}} +{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }} +{{- end }} + +{{/* +Create a name for the chart +*/}} +{{- define "benchmark.name" -}} +{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }} +{{- end }} diff --git a/tools/benchmark/catalog/charts/LoadBalancer/K8SService/templates/service.yaml b/tools/benchmark/catalog/charts/LoadBalancer/K8SService/templates/service.yaml new file mode 100644 index 000000000..d7a33a911 --- /dev/null +++ b/tools/benchmark/catalog/charts/LoadBalancer/K8SService/templates/service.yaml @@ -0,0 +1,15 @@ +apiVersion: v1 +kind: Service +metadata: + labels: + generated-by: benchmark-catalog-cli + name: model-server-service + namespace: {{ .Release.Namespace }} +spec: + ports: + - port: 8081 + protocol: TCP + targetPort: 8000 + selector: + app: model-server-deployment + type: LoadBalancer diff --git a/tools/benchmark/catalog/charts/LoadBalancer/charts/Envoy-0.1.0.tgz b/tools/benchmark/catalog/charts/LoadBalancer/charts/Envoy-0.1.0.tgz new file mode 100644 index 000000000..0755839c4 Binary files /dev/null and b/tools/benchmark/catalog/charts/LoadBalancer/charts/Envoy-0.1.0.tgz differ diff --git a/tools/benchmark/catalog/charts/LoadBalancer/charts/GKE-0.1.0.tgz b/tools/benchmark/catalog/charts/LoadBalancer/charts/GKE-0.1.0.tgz new file mode 100644 index 000000000..0ebe4de3b Binary files /dev/null and b/tools/benchmark/catalog/charts/LoadBalancer/charts/GKE-0.1.0.tgz differ diff --git a/tools/benchmark/catalog/charts/LoadBalancer/charts/Gateway-0.1.0.tgz b/tools/benchmark/catalog/charts/LoadBalancer/charts/Gateway-0.1.0.tgz new file mode 100644 index 000000000..044570984 Binary files /dev/null and b/tools/benchmark/catalog/charts/LoadBalancer/charts/Gateway-0.1.0.tgz differ diff --git a/tools/benchmark/catalog/charts/LoadBalancer/charts/K8SService-0.1.0.tgz b/tools/benchmark/catalog/charts/LoadBalancer/charts/K8SService-0.1.0.tgz new file mode 100644 index 000000000..05c59b084 Binary files /dev/null and b/tools/benchmark/catalog/charts/LoadBalancer/charts/K8SService-0.1.0.tgz differ diff --git a/tools/benchmark/catalog/charts/LoadBalancer/templates/_helpers.tpl b/tools/benchmark/catalog/charts/LoadBalancer/templates/_helpers.tpl new file mode 100644 index 000000000..2d391062f --- /dev/null +++ b/tools/benchmark/catalog/charts/LoadBalancer/templates/_helpers.tpl @@ -0,0 +1,33 @@ +{{/* +Common labels +*/}} +{{- define "benchmark.labels" -}} +helm.sh/chart: {{ include "benchmark.chart" . }} +{{ include "benchmark.selectorLabels" . }} +{{- if .Chart.AppVersion }} +app.kubernetes.io/version: {{ .Chart.AppVersion | quote }} +{{- end }} +app.kubernetes.io/managed-by: {{ .Release.Service }} +{{- end }} + +{{/* +Selector labels +*/}} +{{- define "benchmark.selectorLabels" -}} +app.kubernetes.io/name: {{ include "benchmark.name" . }} +app.kubernetes.io/instance: {{ .Release.Name }} +{{- end }} + +{{/* +Create chart name and version as used by the chart label. +*/}} +{{- define "benchmark.chart" -}} +{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }} +{{- end }} + +{{/* +Create a name for the chart +*/}} +{{- define "benchmark.name" -}} +{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }} +{{- end }} diff --git a/tools/benchmark/catalog/charts/ModelServer/.helmignore b/tools/benchmark/catalog/charts/ModelServer/.helmignore new file mode 100644 index 000000000..0e8a0eb36 --- /dev/null +++ b/tools/benchmark/catalog/charts/ModelServer/.helmignore @@ -0,0 +1,23 @@ +# Patterns to ignore when building packages. +# This supports shell glob matching, relative path matching, and +# negation (prefixed with !). Only one pattern per line. +.DS_Store +# Common VCS dirs +.git/ +.gitignore +.bzr/ +.bzrignore +.hg/ +.hgignore +.svn/ +# Common backup files +*.swp +*.bak +*.tmp +*.orig +*~ +# Various IDEs +.project +.idea/ +*.tmproj +.vscode/ diff --git a/tools/benchmark/catalog/charts/ModelServer/Chart.yaml b/tools/benchmark/catalog/charts/ModelServer/Chart.yaml new file mode 100644 index 000000000..931061016 --- /dev/null +++ b/tools/benchmark/catalog/charts/ModelServer/Chart.yaml @@ -0,0 +1,24 @@ +apiVersion: v2 +name: ModelServer +description: A Helm chart for Kubernetes + +# A chart can be either an 'application' or a 'library' chart. +# +# Application charts are a collection of templates that can be packaged into versioned archives +# to be deployed. +# +# Library charts provide useful utilities or functions for the chart developer. They're included as +# a dependency of application charts to inject those utilities and functions into the rendering +# pipeline. Library charts do not define any templates and therefore cannot be deployed. +type: application + +# This is the chart version. This version number should be incremented each time you make changes +# to the chart and its templates, including the app version. +# Versions are expected to follow Semantic Versioning (https://semver.org/) +version: 0.1.0 + +# This is the version number of the application being deployed. This version number should be +# incremented each time you make changes to the application. Versions are not expected to +# follow Semantic Versioning. They should reflect the version the application is using. +# It is recommended to use it with quotes. +appVersion: "1.16.0" diff --git a/tools/benchmark/catalog/charts/ModelServer/templates/_helpers.tpl b/tools/benchmark/catalog/charts/ModelServer/templates/_helpers.tpl new file mode 100644 index 000000000..2d391062f --- /dev/null +++ b/tools/benchmark/catalog/charts/ModelServer/templates/_helpers.tpl @@ -0,0 +1,33 @@ +{{/* +Common labels +*/}} +{{- define "benchmark.labels" -}} +helm.sh/chart: {{ include "benchmark.chart" . }} +{{ include "benchmark.selectorLabels" . }} +{{- if .Chart.AppVersion }} +app.kubernetes.io/version: {{ .Chart.AppVersion | quote }} +{{- end }} +app.kubernetes.io/managed-by: {{ .Release.Service }} +{{- end }} + +{{/* +Selector labels +*/}} +{{- define "benchmark.selectorLabels" -}} +app.kubernetes.io/name: {{ include "benchmark.name" . }} +app.kubernetes.io/instance: {{ .Release.Name }} +{{- end }} + +{{/* +Create chart name and version as used by the chart label. +*/}} +{{- define "benchmark.chart" -}} +{{- printf "%s-%s" .Chart.Name .Chart.Version | replace "+" "_" | trunc 63 | trimSuffix "-" }} +{{- end }} + +{{/* +Create a name for the chart +*/}} +{{- define "benchmark.name" -}} +{{- default .Chart.Name .Values.nameOverride | trunc 63 | trimSuffix "-" }} +{{- end }} diff --git a/tools/benchmark/catalog/charts/ModelServer/templates/deployment.yaml b/tools/benchmark/catalog/charts/ModelServer/templates/deployment.yaml new file mode 100644 index 000000000..2aa7ee903 --- /dev/null +++ b/tools/benchmark/catalog/charts/ModelServer/templates/deployment.yaml @@ -0,0 +1,100 @@ +apiVersion: apps/v1 +kind: Deployment +metadata: + labels: + app: model-server-deployment + name: model-server-deployment + namespace: {{ .Release.Namespace }} +spec: + replicas: {{ .Values.global.config.model_server.replicas }} + selector: + matchLabels: + app: model-server-deployment + template: + metadata: + labels: + app: model-server-deployment + spec: + containers: + - args: + - --port + - "8000" + - --max-num-seqs + - "2048" + - --max_model_len + - "4096" + - --compilation-config + - "3" + - --tensor-parallel-size + - {{ .Values.global.config.model_server.vllm.tensor_parallelism | quote }} + - --model + - {{ .Values.global.config.model_server.vllm.model | quote }} + command: + - python3 + - -m + - vllm.entrypoints.openai.api_server + env: + - name: PORT + value: "8000" + - name: HUGGING_FACE_HUB_TOKEN + valueFrom: + secretKeyRef: + key: token + name: hf-token + - name: VLLM_ALLOW_RUNTIME_LORA_UPDATING + value: "true" + - name: VLLM_USE_V1 + value: {{ .Values.global.config.model_server.vllm.v1 | quote }} + image: {{ .Values.global.config.model_server.image }} + imagePullPolicy: Always + livenessProbe: + failureThreshold: 240 + httpGet: + path: /health + port: http + scheme: HTTP + initialDelaySeconds: 5 + periodSeconds: 5 + successThreshold: 1 + timeoutSeconds: 1 + name: inference-server + ports: + - containerPort: 8000 + name: http + protocol: TCP + readinessProbe: + failureThreshold: 600 + httpGet: + path: /health + port: http + scheme: HTTP + initialDelaySeconds: 5 + periodSeconds: 5 + successThreshold: 1 + timeoutSeconds: 1 + resources: + limits: + nvidia.com/gpu: {{ .Values.global.config.model_server.vllm.tensor_parallelism }} + requests: + nvidia.com/gpu: {{ .Values.global.config.model_server.vllm.tensor_parallelism }} + volumeMounts: + - mountPath: /data + name: data + - mountPath: /dev/shm + name: shm + - mountPath: /adapters + name: adapters + nodeSelector: + cloud.google.com/gke-accelerator: {{ .Values.global.config.model_server.accelerator | quote }} + restartPolicy: Always + schedulerName: default-scheduler + terminationGracePeriodSeconds: 30 + volumes: + - emptyDir: {} + name: data + - emptyDir: + medium: Memory + name: shm + - emptyDir: {} + name: adapters + diff --git a/tools/benchmark/catalog/charts/ModelServer/templates/pod-monitoring.yaml b/tools/benchmark/catalog/charts/ModelServer/templates/pod-monitoring.yaml new file mode 100644 index 000000000..8eb7543e6 --- /dev/null +++ b/tools/benchmark/catalog/charts/ModelServer/templates/pod-monitoring.yaml @@ -0,0 +1,21 @@ +apiVersion: monitoring.googleapis.com/v1 +kind: PodMonitoring +metadata: + labels: + app.kubernetes.io/name: model-server-monitoring + app.kubernetes.io/part-of: google-cloud-managed-prometheus + generated-by: benchmark-catalog-cli + name: model-server-monitoring + namespace: {{ .Release.Namespace }} +spec: + endpoints: + - interval: 5s + path: /metrics + port: 8000 + scheme: http + selector: + matchExpressions: + - key: app + operator: In + values: + - model-server-deployment \ No newline at end of file diff --git a/tools/benchmark/dev.md b/tools/benchmark/dev.md new file mode 100644 index 000000000..3a2d9f574 --- /dev/null +++ b/tools/benchmark/dev.md @@ -0,0 +1,31 @@ +## The manifest generator + +[code](./manifestgenerator) + +The manifestgenerator takes in the `Benchmark` proto as input, and generates benchmark manifests for each part of the config (ModelServer, LoadBalancer, BenchmarkTool), using Helm. + +### Benchmark inheritance + +Each benchmark MUST have a name, and optionally the name of its base benchmark. The scope of the inheritance is limited to the benchmarks in the same pbtxt file. + +### Determine the LoadBalancer address + +The address of the LoadBalancer is usually known at runtime after the load balancer is deployed. In the case of EPP, we wait for the corresponding Envoy service to be ready. If we benchmark against a public gateway IP, we may wait for the gateway IP to be available. Therefore, we can specify the manifestgenerator to generate the `ModelServer` and `LoadBalancer` manifest types first, then call it again to generate the manifest for `BenchmarkTool`, after the `LoadBalancer` is ready. + +### Auto set the request rate + +The tool can automatically choose a curated list of request rates based on # accelerator, accelerator type and model, if the request rates are not specified by the user. + +## FAQs + +### How to add a new config field + +1. Update proto/benchmark.proto. +1. Regenerate the go file: `protoc --go_out=. --go_opt=paths=source_relative benchmark.proto` +1. For Helm generator, edit corresponding helm templates to parse the new field. + +### How to add a new accelerator type or model + +Update `applyRequestRatesDefaults` in `pkg/utils/benchmark_config.go` for the new accelerator type and/or model. + + diff --git a/tools/benchmark/go.mod b/tools/benchmark/go.mod new file mode 100644 index 000000000..004216237 --- /dev/null +++ b/tools/benchmark/go.mod @@ -0,0 +1,36 @@ +module benchmark-catalog + +go 1.23.0 + +toolchain go1.23.4 + +require ( + github.com/golang/protobuf v1.5.4 + google.golang.org/protobuf v1.36.0 + k8s.io/klog/v2 v2.130.1 + sigs.k8s.io/kustomize/api v0.18.0 + sigs.k8s.io/kustomize/kyaml v0.18.1 + sigs.k8s.io/yaml v1.4.0 +) + +require ( + github.com/blang/semver/v4 v4.0.0 // indirect + github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc // indirect + github.com/go-errors/errors v1.4.2 // indirect + github.com/go-logr/logr v1.4.2 // indirect + github.com/go-openapi/jsonpointer v0.21.0 // indirect + github.com/go-openapi/jsonreference v0.20.2 // indirect + github.com/go-openapi/swag v0.23.0 // indirect + github.com/google/gnostic-models v0.6.9 // indirect + github.com/google/shlex v0.0.0-20191202100458-e7afc7fbc510 // indirect + github.com/josharian/intern v1.0.0 // indirect + github.com/mailru/easyjson v0.7.7 // indirect + github.com/monochromegane/go-gitignore v0.0.0-20200626010858-205db1a8cc00 // indirect + github.com/pkg/errors v0.9.1 // indirect + github.com/xlab/treeprint v1.2.0 // indirect + golang.org/x/sys v0.26.0 // indirect + gopkg.in/evanphx/json-patch.v4 v4.12.0 // indirect + gopkg.in/yaml.v3 v3.0.1 // indirect + k8s.io/kube-openapi v0.0.0-20241212222426-2c72e554b1e7 // indirect + sigs.k8s.io/json v0.0.0-20241010143419-9aa6b5e7a4b3 // indirect +) diff --git a/tools/benchmark/go.sum b/tools/benchmark/go.sum new file mode 100644 index 000000000..c947e7f42 --- /dev/null +++ b/tools/benchmark/go.sum @@ -0,0 +1,91 @@ +github.com/blang/semver/v4 v4.0.0 h1:1PFHFE6yCCTv8C1TeyNNarDzntLi7wMI5i/pzqYIsAM= +github.com/blang/semver/v4 v4.0.0/go.mod h1:IbckMUScFkM3pff0VJDNKRiT6TG/YpiHIM2yvyW5YoQ= +github.com/creack/pty v1.1.9/go.mod h1:oKZEueFk5CKHvIhNR5MUki03XCEU+Q6VDXinZuGJ33E= +github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= +github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= +github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc h1:U9qPSI2PIWSS1VwoXQT9A3Wy9MM3WgvqSxFWenqJduM= +github.com/davecgh/go-spew v1.1.2-0.20180830191138-d8f796af33cc/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= +github.com/go-errors/errors v1.4.2 h1:J6MZopCL4uSllY1OfXM374weqZFFItUbrImctkmUxIA= +github.com/go-errors/errors v1.4.2/go.mod h1:sIVyrIiJhuEF+Pj9Ebtd6P/rEYROXFi3BopGUQ5a5Og= +github.com/go-logr/logr v1.4.2 h1:6pFjapn8bFcIbiKo3XT4j/BhANplGihG6tvd+8rYgrY= +github.com/go-logr/logr v1.4.2/go.mod h1:9T104GzyrTigFIr8wt5mBrctHMim0Nb2HLGrmQ40KvY= +github.com/go-openapi/jsonpointer v0.19.6/go.mod h1:osyAmYz/mB/C3I+WsTTSgw1ONzaLJoLCyoi6/zppojs= +github.com/go-openapi/jsonpointer v0.21.0 h1:YgdVicSA9vH5RiHs9TZW5oyafXZFc6+2Vc1rr/O9oNQ= +github.com/go-openapi/jsonpointer v0.21.0/go.mod h1:IUyH9l/+uyhIYQ/PXVA41Rexl+kOkAPDdXEYns6fzUY= +github.com/go-openapi/jsonreference v0.20.2 h1:3sVjiK66+uXK/6oQ8xgcRKcFgQ5KXa2KvnJRumpMGbE= +github.com/go-openapi/jsonreference v0.20.2/go.mod h1:Bl1zwGIM8/wsvqjsOQLJ/SH+En5Ap4rVB5KVcIDZG2k= +github.com/go-openapi/swag v0.22.3/go.mod h1:UzaqsxGiab7freDnrUUra0MwWfN/q7tE4j+VcZ0yl14= +github.com/go-openapi/swag v0.23.0 h1:vsEVJDUo2hPJ2tu0/Xc+4noaxyEffXNIs3cOULZ+GrE= +github.com/go-openapi/swag v0.23.0/go.mod h1:esZ8ITTYEsH1V2trKHjAN8Ai7xHb8RV+YSZ577vPjgQ= +github.com/golang/protobuf v1.5.4 h1:i7eJL8qZTpSEXOPTxNKhASYpMn+8e5Q6AdndVa1dWek= +github.com/golang/protobuf v1.5.4/go.mod h1:lnTiLA8Wa4RWRcIUkrtSVa5nRhsEGBg48fD6rSs7xps= +github.com/google/gnostic-models v0.6.9 h1:MU/8wDLif2qCXZmzncUQ/BOfxWfthHi63KqpoNbWqVw= +github.com/google/gnostic-models v0.6.9/go.mod h1:CiWsm0s6BSQd1hRn8/QmxqB6BesYcbSZxsz9b0KuDBw= +github.com/google/go-cmp v0.5.9/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY= +github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI= +github.com/google/go-cmp v0.6.0/go.mod h1:17dUlkBOakJ0+DkrSSNjCkIjxS6bF9zb3elmeNGIjoY= +github.com/google/gofuzz v1.2.0 h1:xRy4A+RhZaiKjJ1bPfwQ8sedCA+YS2YcCHW6ec7JMi0= +github.com/google/gofuzz v1.2.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg= +github.com/google/shlex v0.0.0-20191202100458-e7afc7fbc510 h1:El6M4kTTCOh6aBiKaUGG7oYTSPP8MxqL4YI3kZKwcP4= +github.com/google/shlex v0.0.0-20191202100458-e7afc7fbc510/go.mod h1:pupxD2MaaD3pAXIBCelhxNneeOaAeabZDe5s4K6zSpQ= +github.com/josharian/intern v1.0.0 h1:vlS4z54oSdjm0bgjRigI+G1HpF+tI+9rE5LLzOg8HmY= +github.com/josharian/intern v1.0.0/go.mod h1:5DoeVV0s6jJacbCEi61lwdGj/aVlrQvzHFFd8Hwg//Y= +github.com/kr/pretty v0.2.1/go.mod h1:ipq/a2n7PKx3OHsz4KJII5eveXtPO4qwEXGdVfWzfnI= +github.com/kr/pretty v0.3.1 h1:flRD4NNwYAUpkphVc1HcthR4KEIFJ65n8Mw5qdRn3LE= +github.com/kr/pretty v0.3.1/go.mod h1:hoEshYVHaxMs3cyo3Yncou5ZscifuDolrwPKZanG3xk= +github.com/kr/pty v1.1.1/go.mod h1:pFQYn66WHrOpPYNljwOMqo10TkYh1fy3cYio2l3bCsQ= +github.com/kr/text v0.1.0/go.mod h1:4Jbv+DJW3UT/LiOwJeYQe1efqtUx/iVham/4vfdArNI= +github.com/kr/text v0.2.0 h1:5Nx0Ya0ZqY2ygV366QzturHI13Jq95ApcVaJBhpS+AY= +github.com/kr/text v0.2.0/go.mod h1:eLer722TekiGuMkidMxC/pM04lWEeraHUUmBw8l2grE= +github.com/mailru/easyjson v0.7.7 h1:UGYAvKxe3sBsEDzO8ZeWOSlIQfWFlxbzLZe7hwFURr0= +github.com/mailru/easyjson v0.7.7/go.mod h1:xzfreul335JAWq5oZzymOObrkdz5UnU4kGfJJLY9Nlc= +github.com/monochromegane/go-gitignore v0.0.0-20200626010858-205db1a8cc00 h1:n6/2gBQ3RWajuToeY6ZtZTIKv2v7ThUy5KKusIT0yc0= +github.com/monochromegane/go-gitignore v0.0.0-20200626010858-205db1a8cc00/go.mod h1:Pm3mSP3c5uWn86xMLZ5Sa7JB9GsEZySvHYXCTK4E9q4= +github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4= +github.com/pkg/errors v0.9.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0= +github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= +github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2 h1:Jamvg5psRIccs7FGNTlIRMkT8wgtp5eCXdBlqhYGL6U= +github.com/pmezard/go-difflib v1.0.1-0.20181226105442-5d4384ee4fb2/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= +github.com/rogpeppe/go-internal v1.12.0 h1:exVL4IDcn6na9z1rAb56Vxr+CgyK3nn3O+epU5NdKM8= +github.com/rogpeppe/go-internal v1.12.0/go.mod h1:E+RYuTGaKKdloAfM02xzb0FW3Paa99yedzYV+kq4uf4= +github.com/sergi/go-diff v1.2.0 h1:XU+rvMAioB0UC3q1MFrIQy4Vo5/4VsRDQQXHsEya6xQ= +github.com/sergi/go-diff v1.2.0/go.mod h1:STckp+ISIX8hZLjrqAeVduY0gWCT9IjLuqbuNXdaHfM= +github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= +github.com/stretchr/objx v0.4.0/go.mod h1:YvHI0jy2hoMjB+UWwv71VJQ9isScKT/TqJzVSSt89Yw= +github.com/stretchr/objx v0.5.0/go.mod h1:Yh+to48EsGEfYuaHDzXPcE3xhTkx73EhmCGUpEOglKo= +github.com/stretchr/objx v0.5.2 h1:xuMeJ0Sdp5ZMRXx/aWO6RZxdr3beISkG5/G/aIRr3pY= +github.com/stretchr/objx v0.5.2/go.mod h1:FRsXN1f5AsAjCGJKqEizvkpNtU+EGNCLh3NxZ/8L+MA= +github.com/stretchr/testify v1.7.0/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg= +github.com/stretchr/testify v1.7.1/go.mod h1:6Fq8oRcR53rry900zMqJjRRixrwX3KX962/h/Wwjteg= +github.com/stretchr/testify v1.8.0/go.mod h1:yNjHg4UonilssWZ8iaSj1OCr/vHnekPRkoO+kdMU+MU= +github.com/stretchr/testify v1.8.1/go.mod h1:w2LPCIKwWwSfY2zedu0+kehJoqGctiVI29o6fzry7u4= +github.com/stretchr/testify v1.9.0 h1:HtqpIVDClZ4nwg75+f6Lvsy/wHu+3BoSGCbBAcpTsTg= +github.com/stretchr/testify v1.9.0/go.mod h1:r2ic/lqez/lEtzL7wO/rwa5dbSLXVDPFyf8C91i36aY= +github.com/xlab/treeprint v1.2.0 h1:HzHnuAF1plUN2zGlAFHbSQP2qJ0ZAD3XF5XD7OesXRQ= +github.com/xlab/treeprint v1.2.0/go.mod h1:gj5Gd3gPdKtR1ikdDK6fnFLdmIS0X30kTTuNd/WEJu0= +go.uber.org/goleak v1.3.0 h1:2K3zAYmnTNqV73imy9J1T3WC+gmCePx2hEGkimedGto= +go.uber.org/goleak v1.3.0/go.mod h1:CoHD4mav9JJNrW/WLlf7HGZPjdw8EucARQHekz1X6bE= +golang.org/x/sys v0.26.0 h1:KHjCJyddX0LoSTb3J+vWpupP9p0oznkqVk/IfjymZbo= +golang.org/x/sys v0.26.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA= +google.golang.org/protobuf v1.36.0 h1:mjIs9gYtt56AzC4ZaffQuh88TZurBGhIJMBZGSxNerQ= +google.golang.org/protobuf v1.36.0/go.mod h1:9fA7Ob0pmnwhb644+1+CVWFRbNajQ6iRojtC/QF5bRE= +gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= +gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c h1:Hei/4ADfdWqJk1ZMxUNpqntNwaWcugrBjAiHlqqRiVk= +gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c/go.mod h1:JHkPIbrfpd72SG/EVd6muEfDQjcINNoR0C8j2r3qZ4Q= +gopkg.in/evanphx/json-patch.v4 v4.12.0 h1:n6jtcsulIzXPJaxegRbvFNNrZDjbij7ny3gmSPG+6V4= +gopkg.in/evanphx/json-patch.v4 v4.12.0/go.mod h1:p8EYWUEYMpynmqDbY58zCKCFZw8pRWMG4EsWvDvM72M= +gopkg.in/yaml.v3 v3.0.0-20200313102051-9f266ea9e77c/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= +gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA= +gopkg.in/yaml.v3 v3.0.1/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM= +k8s.io/klog/v2 v2.130.1 h1:n9Xl7H1Xvksem4KFG4PYbdQCQxqc/tTUyrgXaOhHSzk= +k8s.io/klog/v2 v2.130.1/go.mod h1:3Jpz1GvMt720eyJH1ckRHK1EDfpxISzJ7I9OYgaDtPE= +k8s.io/kube-openapi v0.0.0-20241212222426-2c72e554b1e7 h1:hcha5B1kVACrLujCKLbr8XWMxCxzQx42DY8QKYJrDLg= +k8s.io/kube-openapi v0.0.0-20241212222426-2c72e554b1e7/go.mod h1:GewRfANuJ70iYzvn+i4lezLDAFzvjxZYK1gn1lWcfas= +sigs.k8s.io/json v0.0.0-20241010143419-9aa6b5e7a4b3 h1:/Rv+M11QRah1itp8VhT6HoVx1Ray9eB4DBr+K+/sCJ8= +sigs.k8s.io/json v0.0.0-20241010143419-9aa6b5e7a4b3/go.mod h1:18nIHnGi6636UCz6m8i4DhaJ65T6EruyzmoQqI2BVDo= +sigs.k8s.io/kustomize/api v0.18.0 h1:hTzp67k+3NEVInwz5BHyzc9rGxIauoXferXyjv5lWPo= +sigs.k8s.io/kustomize/api v0.18.0/go.mod h1:f8isXnX+8b+SGLHQ6yO4JG1rdkZlvhaCf/uZbLVMb0U= +sigs.k8s.io/kustomize/kyaml v0.18.1 h1:WvBo56Wzw3fjS+7vBjN6TeivvpbW9GmRaWZ9CIVmt4E= +sigs.k8s.io/kustomize/kyaml v0.18.1/go.mod h1:C3L2BFVU1jgcddNBE1TxuVLgS46TjObMwW5FT9FcjYo= +sigs.k8s.io/yaml v1.4.0 h1:Mk1wCc2gy/F0THH0TAp1QYyJNzRm2KCLy3o5ASXVI5E= +sigs.k8s.io/yaml v1.4.0/go.mod h1:Ejl7/uTz7PSA4eKMyQCUTnhZYNmLIl+5c2lQPGR2BPY= diff --git a/tools/benchmark/manifestgenerator/main.go b/tools/benchmark/manifestgenerator/main.go new file mode 100644 index 000000000..d84e67b6a --- /dev/null +++ b/tools/benchmark/manifestgenerator/main.go @@ -0,0 +1,158 @@ +package main + +import ( + "bytes" + "flag" + "fmt" + "os/exec" + "path/filepath" + "strings" + + "benchmark-catalog/manifestgenerator/utils" + benchmarkpb "benchmark-catalog/proto" + + klog "k8s.io/klog/v2" +) + +var ( + catalogDir = flag.String("catalogDir", "../catalog/", "catalog path containing all kustomize components") + benchmarkFilePath = flag.String("benchmarkFilePath", "benchmarks.pbtxt", "prototxt file of a SINGLE benchmark to run, ignored when benchmarks is provided") + benchmarks = flag.String("benchmarks", "", "prototxt file of the benchmarks to run under the catalogDir") + outputRootDir = flag.String("outputRootDir", "../output", "root directory to store output files") + manifestTypes = flag.String("manifestTypes", "ModelServer,LoadBalancer", "comma separated list of manifest types of {ModelServer, LoadBalancer, BenchmarkTool}. NOTE: Do not generate BenchmarkTool manifest until the LoadBalancer is deployed.") + runID = flag.String("runID", "default", "ID of the run, which can be shared across multiple benchmarks") + override = flag.Bool("override", false, "whether to override existing benchmark and manifest files") +) + +func main() { + klog.InitFlags(nil) + flag.Parse() + // Print all flag values + flags := "Flags: " + flag.VisitAll(func(f *flag.Flag) { + flags += fmt.Sprintf("%s=%v; ", f.Name, f.Value) + }) + klog.V(1).Info(flags) + + var bs []*benchmarkpb.Benchmark + var err error + // Apply defaulting and validation on input benchmark file, and save the output benchmark file. + if *benchmarks != "" { + benchmarksFile := filepath.Join(*catalogDir, *benchmarks) + klog.Infof("Reading source benchmarks file: %v", benchmarksFile) + bs, err = utils.ReadBenchmarks(benchmarksFile) + if err != nil { + klog.Fatalf("Failed to read benchmarks: %v", err) + } + klog.Infof("Read %v benchmarks", len(bs)) + } else { // Read single benchmark file instead + klog.Infof("Reading single benchmark: %v", *benchmarkFilePath) + b, err := utils.ReadBenchmark(*benchmarkFilePath) + if err != nil { + klog.Fatalf("Failed to read benchmark: %v", err) + } + bs = append(bs, b) + } + + for _, b := range bs { + processBenchmark(b) + } +} + +func processBenchmark(b *benchmarkpb.Benchmark) { + gen := &HelmGenerator{} + + klog.V(2).Infof("Processing benchmark %v: %+v", b.GetName(), b) + benchmarkNameWithRunID := b.GetName() + "-" + *runID + namespace := benchmarkNameWithRunID + outputDir := filepath.Join(*outputRootDir, *runID, benchmarkNameWithRunID) + outputBenchmarkFile := filepath.Join(outputDir, "benchmark") + // Configure namespaces for things (mostly object references) that cannot be directly overriden by Kustomize. + klog.V(2).Infof("Setting namespace to %v", namespace) + b.GetConfig().Namespace = namespace + if strings.Contains(*manifestTypes, "BenchmarkTool") { + // Configure the IP for benchmark, based on LoadBalancer configuration + if b.GetConfig().GetLoadBalancer().GetK8SService() != nil { + ip := fmt.Sprintf("model-server-service.%v.svc.cluster.local", namespace) + klog.V(2).Infof("Setting IP to %v", ip) + b.GetConfig().GetBenchmarkTool().GetLpg().Ip = ip + } + if b.GetConfig().GetLoadBalancer().GetGateway() != nil { + command := fmt.Sprintf("kubectl get service -n envoy-gateway-system -l gateway.envoyproxy.io/owning-gateway-name=inference-gateway,gateway.envoyproxy.io/owning-gateway-namespace=%s | grep envoy | awk '{print $1}'", namespace) + klog.V(2).Infof("Running command: %v", command) + gwSvc, err := runBashCommand(command) + if err != nil { + klog.Fatalf("Failed to run command to get Gateway IP: %v", err) + } + klog.V(2).Infof("Gateway service IP:%v", gwSvc) + ip := fmt.Sprintf("%v.envoy-gateway-system.svc.cluster.local", gwSvc) + klog.V(2).Infof("Setting IP to %v", ip) + b.GetConfig().GetBenchmarkTool().GetLpg().Ip = ip + } + } + utils.SaveBenchmark(b, outputBenchmarkFile, true) + + // Generate manifests. + bc := b.GetConfig() + klog.V(2).Infof("Benchmark config: %+v", bc) + if strings.Contains(*manifestTypes, "ModelServer") { + gen.GenerateOneManifestType(bc.GetModelServer(), namespace, *catalogDir, "ModelServer", outputDir, *override) + } + if strings.Contains(*manifestTypes, "LoadBalancer") { + gen.GenerateOneManifestType(bc.GetLoadBalancer(), namespace, *catalogDir, "LoadBalancer", outputDir, *override) + } + if strings.Contains(*manifestTypes, "BenchmarkTool") { + gen.GenerateOneManifestType(bc.GetBenchmarkTool(), namespace, *catalogDir, "BenchmarkTool", outputDir, *override) + } +} + +type Generator interface { + // GenerateOneManifestType generates the manifest yaml for a particular manifest type in the + // benchmark config, such as the ModelServer. The output will be saved to /outputDir/manifests/manifestType.yaml. + // the catalogManifestFolder is the workspace dir for kustomize. + GenerateOneManifestType(msg any, namespace, catalogManifestFolder, manifestType, outputDir string, override bool) +} + +type HelmGenerator struct{} + +func (h *HelmGenerator) GenerateOneManifestType(msg any, namespace, catalogManifestFolder, manifestType, outputDir string, override bool) { + chartPath := filepath.Join(catalogManifestFolder, "charts", manifestType) + _, err := runBashCommand(fmt.Sprintf("helm dependency update %s", chartPath)) + if err != nil { + klog.Fatalf("Failed to update helm dependencies: %v", err) + } + valuesFile := filepath.Join(outputDir, "benchmark.yaml") + // Example: helm template BenchmarkTool {catalogManifestFolder}/charts/BenchmarkTool -n default -f BenchmarkTool/values.yaml + helmCommand := fmt.Sprintf("helm template %s %s -n %s -f %s", strings.ToLower(manifestType), chartPath, namespace, valuesFile) + + // Run the helm command + output, err := runBashCommand(helmCommand) + if err != nil { + klog.Fatalf("Failed to run helm command: %v", err) + } + + // Save the output to a file + outputFile := filepath.Join(outputDir, "manifests", manifestType+".yaml") + if err := utils.SaveFile(outputFile, []byte(output), override); err != nil { + klog.Fatalf("Failed to save helm output: %v", err) + } +} + +func runBashCommand(command string) (string, error) { + klog.V(1).Infof("Running command: %s", command) + // Create a new command + cmd := exec.Command("bash", "-c", command) + + // Create a buffer to capture the output + var out bytes.Buffer + cmd.Stdout = &out + + // Run the command + err := cmd.Run() + if err != nil { + return "", err + } + + // Trim whitespace from the output and return it + return strings.TrimSpace(out.String()), nil +} diff --git a/tools/benchmark/manifestgenerator/utils/benchmark_config.go b/tools/benchmark/manifestgenerator/utils/benchmark_config.go new file mode 100644 index 000000000..779ed42dd --- /dev/null +++ b/tools/benchmark/manifestgenerator/utils/benchmark_config.go @@ -0,0 +1,263 @@ +package utils + +import ( + "fmt" + "os" + "strconv" + "strings" + + "google.golang.org/protobuf/encoding/prototext" + "google.golang.org/protobuf/proto" + klog "k8s.io/klog/v2" + + benchmarkpb "benchmark-catalog/proto" +) + +var ( + // The benchmark rates of a single accelerator can handle for llama-2-7b model. + acceleratorQPSLlama2_7b = map[string][]float32{ + // Latency start to grow at 5, through peaks at 6 + "nvidia-l4": {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, + // Latency starts to grow around 32, throughput peaks at 34 + "nvidia-tesla-a100": {2, 4, 6, 8, 10, 12, 16, 24, 28, 30, 32, 34, 36, 40}, + "nvidia-h100-80gb": {10, 20, 30, 40, 60, 70, 80, 90, 100}, + } + acceleratorQPSLlama3_8b = map[string][]float32{ + // Latency start to grow at 5, through peaks at 6 + "nvidia-l4": {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, + // Latency starts to grow around 32, throughput peaks at 34 + "nvidia-tesla-a100": {2, 4, 6, 8, 10, 12, 16, 24, 28, 30, 32, 34, 36, 40}, + // Latency starts to grow at 100, throughput peaks at 110 + "nvidia-h100-80gb": {10, 20, 30, 40, 60, 80, 90, 100, 120, 140, 180}, + } + acceleratorQPSGemma2_27b = map[string][]float32{ + // Latency starts to grow at 8, throughput peaks at 16 + "nvidia-h100-80gb": {2, 4, 6, 10, 12, 16, 20, 24, 28, 32}, + } +) + +func ReadBenchmarks(file string) ([]*benchmarkpb.Benchmark, error) { + klog.V(1).Infof("Reading benchmark from file %s", file) + res := []*benchmarkpb.Benchmark{} + // Read the config file + data, err := os.ReadFile(file) + if err != nil { + return nil, err + } + + bs := &benchmarkpb.Benchmarks{} + if prototext.Unmarshal(data, bs); err != nil { + return nil, fmt.Errorf("failed to unmarshall %v: %v", file, err) + } + klog.V(1).Infof("Read %v raw benchmarks", len(bs.GetBenchmarks())) + klog.V(2).Infof("Raw benchmarks: %+v", bs.GetBenchmarks()) + // Build a map of benchmark names to benchmark + raw := make(map[string]*benchmarkpb.Benchmark, len(bs.Benchmarks)) + for _, benchmark := range bs.Benchmarks { + if _, ok := raw[benchmark.Name]; ok { + return nil, fmt.Errorf("Duplicate benchmark: %v", benchmark.Name) + } + raw[benchmark.Name] = benchmark + } + processed := make(map[string]*benchmarkpb.Benchmark, len(bs.Benchmarks)) + + for _, benchmark := range bs.Benchmarks { + klog.V(2).Infof("Before processing benchmark %v: %v: %+v", benchmark.Name, len(processed), processed) + updated, err := processBenchmark(benchmark, raw, processed) + klog.V(2).Infof("After processing benchmark %v: %v: %+v", benchmark.Name, len(processed), processed) + if err != nil { + return nil, err + } + res = append(res, updated) + } + + return res, nil +} + +func ReadBenchmark(file string) (*benchmarkpb.Benchmark, error) { + // Read the config file + data, err := os.ReadFile(file) + if err != nil { + return nil, err + } + + // Parse the config and override the defaults + b := &benchmarkpb.Benchmark{} + if err := prototext.Unmarshal(data, b); err != nil { + return nil, fmt.Errorf("failed to unmarshall %v: %v", file, err) + } + + if err := applyDefaults(b); err != nil { + return nil, fmt.Errorf("failed to apply defaults: %v", err) + } + + if err := validateBenchmark(b); err != nil { + return nil, fmt.Errorf("failed to validate %v", file) + } + + return b, nil +} + +func processBenchmark(b *benchmarkpb.Benchmark, raw, processed map[string]*benchmarkpb.Benchmark) (*benchmarkpb.Benchmark, error) { + klog.V(2).Infof("Processing benchmark %v: %+v", b.GetName(), b) + updated := proto.Clone(b).(*benchmarkpb.Benchmark) + if b.GetBaseBenchmarkName() != "" { + klog.V(2).Infof("[Benchmark=%v, base=%v]", b.GetName(), b.GetBaseBenchmarkName()) + rawBase, ok := raw[b.GetBaseBenchmarkName()] + if !ok { + return nil, fmt.Errorf("couldn't find base benchmark %v", b.GetBaseBenchmarkName()) + } + if _, ok := processed[b.GetBaseBenchmarkName()]; !ok { + klog.V(2).Infof("[Benchmark=%v, base=%v], base hasn't been processed", b.GetName(), b.GetBaseBenchmarkName()) + processedBase, err := processBenchmark(rawBase, raw, processed) + if err != nil { + return nil, err + } + klog.V(2).Infof("Updating processed benchmark map: %v", processedBase.GetName()) + processed[processedBase.GetName()] = processedBase + } else { + klog.V(2).Infof("[Benchmark=%v, base=%v] Base has already been processed", b.GetName(), b.GetBaseBenchmarkName()) + } + processedBase := processed[b.GetBaseBenchmarkName()] + updated = applyBaseBenchmark(b, processedBase) + } + + if err := applyDefaults(updated); err != nil { + return nil, fmt.Errorf("failed to apply defaults: %v", err) + } + + if err := validateBenchmark(updated); err != nil { + return nil, fmt.Errorf("failed to validate %v", updated.GetName()) + } + klog.V(2).Infof("Updated benchmark %v: %+v", b.GetName(), b) + processed[updated.GetName()] = updated + return updated, nil +} + +func SaveBenchmark(b *benchmarkpb.Benchmark, file string, override bool) error { + out, err := prototext.Marshal(b) + if err != nil { + return fmt.Errorf("error marshalling to pbtxt: %v", err) + } + klog.V(1).Infof("Saving proto file %q", file) + + if err := SaveFile(file+".pbtxt", out, override); err != nil { + return fmt.Errorf("error saving file %q: %v", file, err) + } + + yaml, err := protobufToYAML(&benchmarkpb.Helm{Global: b}) + if err != nil { + return fmt.Errorf("error converting proto to yaml: %v", err) + } + return SaveFile(file+".yaml", yaml, override) +} + +func applyBaseBenchmark(benchmark, base *benchmarkpb.Benchmark) *benchmarkpb.Benchmark { + klog.V(2).Infof("Applying base benchmark %v to %v", base.GetName(), benchmark.GetName()) + updated := proto.Clone(base).(*benchmarkpb.Benchmark) + // Hack: Do not inherit request rates from base. Usually request rates needs to be updated. + updated.GetConfig().GetBenchmarkTool().GetLpg().RequestRates = "" + proto.Merge(updated, benchmark) + return updated +} + +func validateBenchmark(b *benchmarkpb.Benchmark) error { + return nil +} + +func applyDefaults(b *benchmarkpb.Benchmark) error { + b.GetConfig().GetLoadBalancer().GatewayEnabled = b.GetConfig().GetLoadBalancer().GetGateway() != nil + b.GetConfig().GetLoadBalancer().K8SServiceEnabled = b.GetConfig().GetLoadBalancer().GetK8SService() != nil + b.GetConfig().GetLoadBalancer().GatewayEnvoyEnabled = b.GetConfig().GetLoadBalancer().GetGateway().GetEnvoy() != nil + b.GetConfig().GetLoadBalancer().GatewayGkeGatewayEnabled = b.GetConfig().GetLoadBalancer().GetGateway().GetGkeGateway() != nil + b.GetConfig().GetLoadBalancer().GatewayEnvoyEppEnabled = b.GetConfig().GetLoadBalancer().GetGateway().GetEnvoy().GetEpp() != nil + b.GetConfig().GetLoadBalancer().GatewayEnvoyLbPolicyEnabled = b.GetConfig().GetLoadBalancer().GetGateway().GetEnvoy().GetLbPolicy() != "" + + applyVLLMDefaults(b.GetConfig().GetModelServer().GetVllm()) + if err := applyBenchmarkToolDefaults(b); err != nil { + return err + } + return nil +} + +func applyBenchmarkToolDefaults(b *benchmarkpb.Benchmark) error { + lpg := b.GetConfig().GetBenchmarkTool().GetLpg() + if lpg == nil { + return nil + } + + if err := applyRequestRatesDefaults(b); err != nil { + return err + } + + return nil +} + +func applyRequestRatesDefaults(b *benchmarkpb.Benchmark) error { + lpg := b.GetConfig().GetBenchmarkTool().GetLpg() + if lpg.GetRequestRates() != "" { + klog.V(2).Infof("Request rates specified, skipping defaults : %v", lpg.GetRequestRates()) + return nil + } + + klog.V(2).Infof("Applying default request rates to %v", b.GetName()) + // Apply default request rates + accelerator := b.GetConfig().GetModelServer().GetAccelerator() + qps, err := defaultRates(accelerator, b.GetConfig().GetModelServer().GetVllm().GetModel()) + if err != nil { + return err + } + tp, err := strconv.Atoi(b.GetConfig().GetModelServer().GetVllm().GetTensorParallelism()) + if err != nil { + return fmt.Errorf("failed to convert tensor TensorParallelism to int") + } + numAccelerators := b.GetConfig().GetModelServer().GetReplicas() * int32(tp) + numModels := len(strings.Split(lpg.GetModels(), ",")) + klog.V(2).Infof("[Benchmark=%v] num models=%v, num accelerators=%v", b.GetName(), numModels, numAccelerators) + + rates := make([]string, 0, len(qps)) + for _, baseRate := range qps { + newRate := baseRate * float32(numAccelerators) / float32(numModels) + rates = append(rates, fmt.Sprintf("%.1f", newRate)) + } + lpg.RequestRates = strings.Join(rates, ",") + klog.V(2).Infof("[Benchmark=%v]Set request rates to %v", b.GetName(), lpg.RequestRates) + return nil +} + +func defaultRates(accelerator, model string) ([]float32, error) { + switch model { + case "meta-llama/Llama-2-7b-hf": + qps, ok := acceleratorQPSLlama2_7b[accelerator] + if !ok { + return nil, fmt.Errorf("unknown accelerator type: %v", accelerator) + } + return qps, nil + case "meta-llama/Llama-3.1-8B-Instruct": + qps, ok := acceleratorQPSLlama3_8b[accelerator] + if !ok { + return nil, fmt.Errorf("unknown accelerator type: %v", accelerator) + } + return qps, nil + case "google/gemma-2-27b": + qps, ok := acceleratorQPSGemma2_27b[accelerator] + if !ok { + return nil, fmt.Errorf("unknown accelerator type: %v", accelerator) + } + return qps, nil + default: + return nil, fmt.Errorf("unsupported model: %v", model) + } +} + +func applyVLLMDefaults(v *benchmarkpb.VLLM) { + if v == nil { + return + } + if v.GetTensorParallelism() == "" { + v.TensorParallelism = "1" + } + if v.GetV1() == "" { + v.V1 = "0" + } +} diff --git a/tools/benchmark/manifestgenerator/utils/file.go b/tools/benchmark/manifestgenerator/utils/file.go new file mode 100644 index 000000000..37898c402 --- /dev/null +++ b/tools/benchmark/manifestgenerator/utils/file.go @@ -0,0 +1,32 @@ +package utils + +import ( + "fmt" + "os" + "path/filepath" + + klog "k8s.io/klog/v2" +) + +func SaveFile(path string, data []byte, override bool) error { + // Check if the file already exists + if _, err := os.Stat(path); err == nil { + if !override { + klog.V(1).Infof("File %v already exists, skipping saving", path) + return nil // File already exists, skip saving + } + } else if !os.IsNotExist(err) { + return fmt.Errorf("error checking if file exists: %v", err) + } + + folder := filepath.Dir(path) + // Create the directory if it doesn't exist + if err := os.MkdirAll(folder, os.ModePerm); err != nil { + return fmt.Errorf("error creating directory: %v", err) + } + klog.V(2).Infof("Writing output to %v", path) + if err := os.WriteFile(path, data, 0644); err != nil { + return fmt.Errorf("failed to write path %v: %v", path, err) + } + return nil +} diff --git a/tools/benchmark/manifestgenerator/utils/proto_to_yaml.go b/tools/benchmark/manifestgenerator/utils/proto_to_yaml.go new file mode 100644 index 000000000..7d6f2cff6 --- /dev/null +++ b/tools/benchmark/manifestgenerator/utils/proto_to_yaml.go @@ -0,0 +1,34 @@ +package utils + +import ( + "fmt" + + "google.golang.org/protobuf/encoding/protojson" + "google.golang.org/protobuf/proto" + "gopkg.in/yaml.v3" + // Import your generated protobuf package + // Replace with the actual import path +) + +// protobufToYAML converts a proto.Message to a YAML file. +func protobufToYAML(message proto.Message) ([]byte, error) { + // 1. Marshal the Protobuf message to JSON bytes using protojson. + jsonBytes, err := protojson.MarshalOptions{ + Multiline: true, // Output multi-line JSON (easier to read before YAML) + Indent: " ", // Indent JSON for readability + UseProtoNames: true, // Use the names from the .proto file (snake_case) + EmitUnpopulated: true, // include fields with default values + }.Marshal(message) + if err != nil { + return nil, fmt.Errorf("failed to marshal protobuf to JSON: %w", err) + } + + // 2. Unmarshal the JSON bytes into a generic YAML structure (map[string]interface{}). + var yamlData interface{} + if err := yaml.Unmarshal(jsonBytes, &yamlData); err != nil { + return nil, fmt.Errorf("failed to unmarshal JSON to YAML data structure: %w", err) + } + + // 3. Marshal the YAML data structure to YAML bytes. + return yaml.Marshal(yamlData) +} diff --git a/tools/benchmark/proto/benchmark.pb b/tools/benchmark/proto/benchmark.pb new file mode 100644 index 000000000..bddf801a5 Binary files /dev/null and b/tools/benchmark/proto/benchmark.pb differ diff --git a/tools/benchmark/proto/benchmark.pb.go b/tools/benchmark/proto/benchmark.pb.go new file mode 100644 index 000000000..c5d8e9e1d --- /dev/null +++ b/tools/benchmark/proto/benchmark.pb.go @@ -0,0 +1,1974 @@ +// Generate the proto: +// protoc --go_out=. --go_opt=paths=source_relative benchmark.proto + +// Code generated by protoc-gen-go. DO NOT EDIT. +// versions: +// protoc-gen-go v1.30.0 +// protoc v5.29.1 +// source: benchmark.proto + +package proto + +import ( + protoreflect "google.golang.org/protobuf/reflect/protoreflect" + protoimpl "google.golang.org/protobuf/runtime/protoimpl" + timestamppb "google.golang.org/protobuf/types/known/timestamppb" + reflect "reflect" + sync "sync" +) + +const ( + // Verify that this generated code is sufficiently up-to-date. + _ = protoimpl.EnforceVersion(20 - protoimpl.MinVersion) + // Verify that runtime/protoimpl is sufficiently up-to-date. + _ = protoimpl.EnforceVersion(protoimpl.MaxVersion - 20) +) + +// A wrapper to hold the global benchmark configuration. This is used to generate the Helm chart values.yaml +type Helm struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + Global *Benchmark `protobuf:"bytes,1,opt,name=global,proto3" json:"global,omitempty"` +} + +func (x *Helm) Reset() { + *x = Helm{} + if protoimpl.UnsafeEnabled { + mi := &file_benchmark_proto_msgTypes[0] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *Helm) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*Helm) ProtoMessage() {} + +func (x *Helm) ProtoReflect() protoreflect.Message { + mi := &file_benchmark_proto_msgTypes[0] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use Helm.ProtoReflect.Descriptor instead. +func (*Helm) Descriptor() ([]byte, []int) { + return file_benchmark_proto_rawDescGZIP(), []int{0} +} + +func (x *Helm) GetGlobal() *Benchmark { + if x != nil { + return x.Global + } + return nil +} + +type Benchmarks struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + Benchmarks []*Benchmark `protobuf:"bytes,1,rep,name=benchmarks,proto3" json:"benchmarks,omitempty"` +} + +func (x *Benchmarks) Reset() { + *x = Benchmarks{} + if protoimpl.UnsafeEnabled { + mi := &file_benchmark_proto_msgTypes[1] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *Benchmarks) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*Benchmarks) ProtoMessage() {} + +func (x *Benchmarks) ProtoReflect() protoreflect.Message { + mi := &file_benchmark_proto_msgTypes[1] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use Benchmarks.ProtoReflect.Descriptor instead. +func (*Benchmarks) Descriptor() ([]byte, []int) { + return file_benchmark_proto_rawDescGZIP(), []int{1} +} + +func (x *Benchmarks) GetBenchmarks() []*Benchmark { + if x != nil { + return x.Benchmarks + } + return nil +} + +// Benchmark captures the information of a benchmark run, and will be persisted to the DB for data +// analysis. +type Benchmark struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + // Optional. + BenchmarkCase *BenchmarkCase `protobuf:"bytes,1,opt,name=benchmark_case,json=benchmarkCase,proto3" json:"benchmark_case,omitempty"` + // Required. User facing configuration to configure the benchmark manifests. + Config *BenchmarkConfig `protobuf:"bytes,2,opt,name=config,proto3" json:"config,omitempty"` + // Optional. Result is automatically collected by the benchmark automation framework. + Result *BenchmarkResult `protobuf:"bytes,3,opt,name=result,proto3" json:"result,omitempty"` + // Optional. Autogenerated by the tool. + StartTime *timestamppb.Timestamp `protobuf:"bytes,4,opt,name=start_time,json=startTime,proto3" json:"start_time,omitempty"` + // Optional. Name is used for matching the base_benchmark_name in the same Benchmarks config file. + Name string `protobuf:"bytes,5,opt,name=name,proto3" json:"name,omitempty"` + // Optional. The name of the parent benchmark configuration to base on. + BaseBenchmarkName string `protobuf:"bytes,6,opt,name=base_benchmark_name,json=baseBenchmarkName,proto3" json:"base_benchmark_name,omitempty"` +} + +func (x *Benchmark) Reset() { + *x = Benchmark{} + if protoimpl.UnsafeEnabled { + mi := &file_benchmark_proto_msgTypes[2] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *Benchmark) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*Benchmark) ProtoMessage() {} + +func (x *Benchmark) ProtoReflect() protoreflect.Message { + mi := &file_benchmark_proto_msgTypes[2] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use Benchmark.ProtoReflect.Descriptor instead. +func (*Benchmark) Descriptor() ([]byte, []int) { + return file_benchmark_proto_rawDescGZIP(), []int{2} +} + +func (x *Benchmark) GetBenchmarkCase() *BenchmarkCase { + if x != nil { + return x.BenchmarkCase + } + return nil +} + +func (x *Benchmark) GetConfig() *BenchmarkConfig { + if x != nil { + return x.Config + } + return nil +} + +func (x *Benchmark) GetResult() *BenchmarkResult { + if x != nil { + return x.Result + } + return nil +} + +func (x *Benchmark) GetStartTime() *timestamppb.Timestamp { + if x != nil { + return x.StartTime + } + return nil +} + +func (x *Benchmark) GetName() string { + if x != nil { + return x.Name + } + return "" +} + +func (x *Benchmark) GetBaseBenchmarkName() string { + if x != nil { + return x.BaseBenchmarkName + } + return "" +} + +type BenchmarkCase struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + // Required. + Name string `protobuf:"bytes,1,opt,name=name,proto3" json:"name,omitempty"` + // Optional. + Description string `protobuf:"bytes,2,opt,name=description,proto3" json:"description,omitempty"` +} + +func (x *BenchmarkCase) Reset() { + *x = BenchmarkCase{} + if protoimpl.UnsafeEnabled { + mi := &file_benchmark_proto_msgTypes[3] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *BenchmarkCase) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*BenchmarkCase) ProtoMessage() {} + +func (x *BenchmarkCase) ProtoReflect() protoreflect.Message { + mi := &file_benchmark_proto_msgTypes[3] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use BenchmarkCase.ProtoReflect.Descriptor instead. +func (*BenchmarkCase) Descriptor() ([]byte, []int) { + return file_benchmark_proto_rawDescGZIP(), []int{3} +} + +func (x *BenchmarkCase) GetName() string { + if x != nil { + return x.Name + } + return "" +} + +func (x *BenchmarkCase) GetDescription() string { + if x != nil { + return x.Description + } + return "" +} + +// BenchmarkConfig is the main user facing configuration for the benchmark run. It is used to +// generate benchmark manifests. +type BenchmarkConfig struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + // Required. Configuration about the load balancer. + LoadBalancer *LoadBalancer `protobuf:"bytes,1,opt,name=load_balancer,json=loadBalancer,proto3" json:"load_balancer,omitempty"` + // Required. Configuration about the model server deployment. + ModelServer *ModelServer `protobuf:"bytes,2,opt,name=model_server,json=modelServer,proto3" json:"model_server,omitempty"` + // Required. Configuration about the benchmark tooling. + BenchmarkTool *BenchmarkTool `protobuf:"bytes,3,opt,name=benchmark_tool,json=benchmarkTool,proto3" json:"benchmark_tool,omitempty"` + // Optional. + Namespace string `protobuf:"bytes,4,opt,name=namespace,proto3" json:"namespace,omitempty"` +} + +func (x *BenchmarkConfig) Reset() { + *x = BenchmarkConfig{} + if protoimpl.UnsafeEnabled { + mi := &file_benchmark_proto_msgTypes[4] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *BenchmarkConfig) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*BenchmarkConfig) ProtoMessage() {} + +func (x *BenchmarkConfig) ProtoReflect() protoreflect.Message { + mi := &file_benchmark_proto_msgTypes[4] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use BenchmarkConfig.ProtoReflect.Descriptor instead. +func (*BenchmarkConfig) Descriptor() ([]byte, []int) { + return file_benchmark_proto_rawDescGZIP(), []int{4} +} + +func (x *BenchmarkConfig) GetLoadBalancer() *LoadBalancer { + if x != nil { + return x.LoadBalancer + } + return nil +} + +func (x *BenchmarkConfig) GetModelServer() *ModelServer { + if x != nil { + return x.ModelServer + } + return nil +} + +func (x *BenchmarkConfig) GetBenchmarkTool() *BenchmarkTool { + if x != nil { + return x.BenchmarkTool + } + return nil +} + +func (x *BenchmarkConfig) GetNamespace() string { + if x != nil { + return x.Namespace + } + return "" +} + +type ModelServer struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + // Optional. Default: "vllm/vllm-openai:latest" + Image string `protobuf:"bytes,1,opt,name=image,proto3" json:"image,omitempty"` + // Optional. Type of the accelerator, e.g, nvidia-tesla-a100, nvidia-l4, etc. + Accelerator string `protobuf:"bytes,2,opt,name=accelerator,proto3" json:"accelerator,omitempty"` + // Optional. Default: 3 + Replicas int32 `protobuf:"varint,3,opt,name=replicas,proto3" json:"replicas,omitempty"` + // Types that are assignable to Type: + // + // *ModelServer_Vllm + Type isModelServer_Type `protobuf_oneof:"type"` +} + +func (x *ModelServer) Reset() { + *x = ModelServer{} + if protoimpl.UnsafeEnabled { + mi := &file_benchmark_proto_msgTypes[5] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *ModelServer) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*ModelServer) ProtoMessage() {} + +func (x *ModelServer) ProtoReflect() protoreflect.Message { + mi := &file_benchmark_proto_msgTypes[5] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use ModelServer.ProtoReflect.Descriptor instead. +func (*ModelServer) Descriptor() ([]byte, []int) { + return file_benchmark_proto_rawDescGZIP(), []int{5} +} + +func (x *ModelServer) GetImage() string { + if x != nil { + return x.Image + } + return "" +} + +func (x *ModelServer) GetAccelerator() string { + if x != nil { + return x.Accelerator + } + return "" +} + +func (x *ModelServer) GetReplicas() int32 { + if x != nil { + return x.Replicas + } + return 0 +} + +func (m *ModelServer) GetType() isModelServer_Type { + if m != nil { + return m.Type + } + return nil +} + +func (x *ModelServer) GetVllm() *VLLM { + if x, ok := x.GetType().(*ModelServer_Vllm); ok { + return x.Vllm + } + return nil +} + +type isModelServer_Type interface { + isModelServer_Type() +} + +type ModelServer_Vllm struct { + Vllm *VLLM `protobuf:"bytes,4,opt,name=vllm,proto3,oneof"` +} + +func (*ModelServer_Vllm) isModelServer_Type() {} + +type VLLM struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + // Optional. Default: "1" + TensorParallelism string `protobuf:"bytes,1,opt,name=tensor_parallelism,json=tensorParallelism,proto3" json:"tensor_parallelism,omitempty"` + // Optional. Default: "3" + MaxLoras string `protobuf:"bytes,2,opt,name=max_loras,json=maxLoras,proto3" json:"max_loras,omitempty"` + // Optional. Default: "meta-llama/Llama-2-7b-hf" + Model string `protobuf:"bytes,3,opt,name=model,proto3" json:"model,omitempty"` + // Optional. Default: 16 + LoraRank string `protobuf:"bytes,4,opt,name=lora_rank,json=loraRank,proto3" json:"lora_rank,omitempty"` + // Optional. Default: "0". + // If set to "1", the V1 model is used. + V1 string `protobuf:"bytes,5,opt,name=v1,proto3" json:"v1,omitempty"` +} + +func (x *VLLM) Reset() { + *x = VLLM{} + if protoimpl.UnsafeEnabled { + mi := &file_benchmark_proto_msgTypes[6] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *VLLM) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*VLLM) ProtoMessage() {} + +func (x *VLLM) ProtoReflect() protoreflect.Message { + mi := &file_benchmark_proto_msgTypes[6] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use VLLM.ProtoReflect.Descriptor instead. +func (*VLLM) Descriptor() ([]byte, []int) { + return file_benchmark_proto_rawDescGZIP(), []int{6} +} + +func (x *VLLM) GetTensorParallelism() string { + if x != nil { + return x.TensorParallelism + } + return "" +} + +func (x *VLLM) GetMaxLoras() string { + if x != nil { + return x.MaxLoras + } + return "" +} + +func (x *VLLM) GetModel() string { + if x != nil { + return x.Model + } + return "" +} + +func (x *VLLM) GetLoraRank() string { + if x != nil { + return x.LoraRank + } + return "" +} + +func (x *VLLM) GetV1() string { + if x != nil { + return x.V1 + } + return "" +} + +type BenchmarkTool struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + Image string `protobuf:"bytes,1,opt,name=image,proto3" json:"image,omitempty"` + // Types that are assignable to Type: + // + // *BenchmarkTool_Lpg + Type isBenchmarkTool_Type `protobuf_oneof:"Type"` +} + +func (x *BenchmarkTool) Reset() { + *x = BenchmarkTool{} + if protoimpl.UnsafeEnabled { + mi := &file_benchmark_proto_msgTypes[7] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *BenchmarkTool) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*BenchmarkTool) ProtoMessage() {} + +func (x *BenchmarkTool) ProtoReflect() protoreflect.Message { + mi := &file_benchmark_proto_msgTypes[7] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use BenchmarkTool.ProtoReflect.Descriptor instead. +func (*BenchmarkTool) Descriptor() ([]byte, []int) { + return file_benchmark_proto_rawDescGZIP(), []int{7} +} + +func (x *BenchmarkTool) GetImage() string { + if x != nil { + return x.Image + } + return "" +} + +func (m *BenchmarkTool) GetType() isBenchmarkTool_Type { + if m != nil { + return m.Type + } + return nil +} + +func (x *BenchmarkTool) GetLpg() *LPG { + if x, ok := x.GetType().(*BenchmarkTool_Lpg); ok { + return x.Lpg + } + return nil +} + +type isBenchmarkTool_Type interface { + isBenchmarkTool_Type() +} + +type BenchmarkTool_Lpg struct { + Lpg *LPG `protobuf:"bytes,2,opt,name=lpg,proto3,oneof"` +} + +func (*BenchmarkTool_Lpg) isBenchmarkTool_Type() {} + +type LPG struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + // Optional. Which input dataset to use, default to + // ShareGPT_V3_unfiltered_cleaned_split. + // Default: "sharegpt_v3_unfiltered_cleaned_split" + Dataset string `protobuf:"bytes,2,opt,name=dataset,proto3" json:"dataset,omitempty"` + // Optional. Which models to use, default to meta-llama/Llama-2-7b-hf + // Default: "meta-llama/Llama-2-7b-hf" + Models string `protobuf:"bytes,3,opt,name=models,proto3" json:"models,omitempty"` + // Optional. Default: "model-server-service.benchmark-catalog.svc.cluster.local" + Ip string `protobuf:"bytes,4,opt,name=ip,proto3" json:"ip,omitempty"` + // Optional. Default: "8081" + Port string `protobuf:"bytes,5,opt,name=port,proto3" json:"port,omitempty"` + // Required. + RequestRates string `protobuf:"bytes,6,opt,name=request_rates,json=requestRates,proto3" json:"request_rates,omitempty"` + // Optional. Default: "60" + BenchmarkTimeSeconds string `protobuf:"bytes,7,opt,name=benchmark_time_seconds,json=benchmarkTimeSeconds,proto3" json:"benchmark_time_seconds,omitempty"` + // Optional. Default: "1024" + OutputLength string `protobuf:"bytes,8,opt,name=output_length,json=outputLength,proto3" json:"output_length,omitempty"` + Tokenizer string `protobuf:"bytes,9,opt,name=tokenizer,proto3" json:"tokenizer,omitempty"` + WarmupSeconds string `protobuf:"bytes,10,opt,name=warmup_seconds,json=warmupSeconds,proto3" json:"warmup_seconds,omitempty"` +} + +func (x *LPG) Reset() { + *x = LPG{} + if protoimpl.UnsafeEnabled { + mi := &file_benchmark_proto_msgTypes[8] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *LPG) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*LPG) ProtoMessage() {} + +func (x *LPG) ProtoReflect() protoreflect.Message { + mi := &file_benchmark_proto_msgTypes[8] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use LPG.ProtoReflect.Descriptor instead. +func (*LPG) Descriptor() ([]byte, []int) { + return file_benchmark_proto_rawDescGZIP(), []int{8} +} + +func (x *LPG) GetDataset() string { + if x != nil { + return x.Dataset + } + return "" +} + +func (x *LPG) GetModels() string { + if x != nil { + return x.Models + } + return "" +} + +func (x *LPG) GetIp() string { + if x != nil { + return x.Ip + } + return "" +} + +func (x *LPG) GetPort() string { + if x != nil { + return x.Port + } + return "" +} + +func (x *LPG) GetRequestRates() string { + if x != nil { + return x.RequestRates + } + return "" +} + +func (x *LPG) GetBenchmarkTimeSeconds() string { + if x != nil { + return x.BenchmarkTimeSeconds + } + return "" +} + +func (x *LPG) GetOutputLength() string { + if x != nil { + return x.OutputLength + } + return "" +} + +func (x *LPG) GetTokenizer() string { + if x != nil { + return x.Tokenizer + } + return "" +} + +func (x *LPG) GetWarmupSeconds() string { + if x != nil { + return x.WarmupSeconds + } + return "" +} + +type LoadBalancer struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + // Types that are assignable to Type: + // + // *LoadBalancer_K8SService + // *LoadBalancer_Gateway + Type isLoadBalancer_Type `protobuf_oneof:"type"` + // Define boolean flags to tell which type is enabled. This is due to a bug in Helm that you must + // explicitly set whether a field is enabled or not. A non-existing field is considered enabled + // due to the bug https://github.com/helm/helm/issues/10296. + K8SServiceEnabled bool `protobuf:"varint,4,opt,name=k8s_service_enabled,json=k8sServiceEnabled,proto3" json:"k8s_service_enabled,omitempty"` + GatewayEnabled bool `protobuf:"varint,5,opt,name=gateway_enabled,json=gatewayEnabled,proto3" json:"gateway_enabled,omitempty"` + GatewayEnvoyEnabled bool `protobuf:"varint,6,opt,name=gateway_envoy_enabled,json=gatewayEnvoyEnabled,proto3" json:"gateway_envoy_enabled,omitempty"` + GatewayGkeGatewayEnabled bool `protobuf:"varint,7,opt,name=gateway_gke_gateway_enabled,json=gatewayGkeGatewayEnabled,proto3" json:"gateway_gke_gateway_enabled,omitempty"` + GatewayEnvoyEppEnabled bool `protobuf:"varint,8,opt,name=gateway_envoy_epp_enabled,json=gatewayEnvoyEppEnabled,proto3" json:"gateway_envoy_epp_enabled,omitempty"` + GatewayEnvoyLbPolicyEnabled bool `protobuf:"varint,9,opt,name=gateway_envoy_lb_policy_enabled,json=gatewayEnvoyLbPolicyEnabled,proto3" json:"gateway_envoy_lb_policy_enabled,omitempty"` +} + +func (x *LoadBalancer) Reset() { + *x = LoadBalancer{} + if protoimpl.UnsafeEnabled { + mi := &file_benchmark_proto_msgTypes[9] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *LoadBalancer) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*LoadBalancer) ProtoMessage() {} + +func (x *LoadBalancer) ProtoReflect() protoreflect.Message { + mi := &file_benchmark_proto_msgTypes[9] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use LoadBalancer.ProtoReflect.Descriptor instead. +func (*LoadBalancer) Descriptor() ([]byte, []int) { + return file_benchmark_proto_rawDescGZIP(), []int{9} +} + +func (m *LoadBalancer) GetType() isLoadBalancer_Type { + if m != nil { + return m.Type + } + return nil +} + +func (x *LoadBalancer) GetK8SService() *K8SService { + if x, ok := x.GetType().(*LoadBalancer_K8SService); ok { + return x.K8SService + } + return nil +} + +func (x *LoadBalancer) GetGateway() *Gateway { + if x, ok := x.GetType().(*LoadBalancer_Gateway); ok { + return x.Gateway + } + return nil +} + +func (x *LoadBalancer) GetK8SServiceEnabled() bool { + if x != nil { + return x.K8SServiceEnabled + } + return false +} + +func (x *LoadBalancer) GetGatewayEnabled() bool { + if x != nil { + return x.GatewayEnabled + } + return false +} + +func (x *LoadBalancer) GetGatewayEnvoyEnabled() bool { + if x != nil { + return x.GatewayEnvoyEnabled + } + return false +} + +func (x *LoadBalancer) GetGatewayGkeGatewayEnabled() bool { + if x != nil { + return x.GatewayGkeGatewayEnabled + } + return false +} + +func (x *LoadBalancer) GetGatewayEnvoyEppEnabled() bool { + if x != nil { + return x.GatewayEnvoyEppEnabled + } + return false +} + +func (x *LoadBalancer) GetGatewayEnvoyLbPolicyEnabled() bool { + if x != nil { + return x.GatewayEnvoyLbPolicyEnabled + } + return false +} + +type isLoadBalancer_Type interface { + isLoadBalancer_Type() +} + +type LoadBalancer_K8SService struct { + K8SService *K8SService `protobuf:"bytes,1,opt,name=k8s_service,json=k8sService,proto3,oneof"` +} + +type LoadBalancer_Gateway struct { + Gateway *Gateway `protobuf:"bytes,2,opt,name=gateway,proto3,oneof"` +} + +func (*LoadBalancer_K8SService) isLoadBalancer_Type() {} + +func (*LoadBalancer_Gateway) isLoadBalancer_Type() {} + +// By default the gateway name is `model-server-gateway` and port is `8081`. +type Gateway struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + // Types that are assignable to Type: + // + // *Gateway_Envoy + // *Gateway_GkeGateway + Type isGateway_Type `protobuf_oneof:"type"` + FullDuplexStreamingEnabled bool `protobuf:"varint,3,opt,name=full_duplex_streaming_enabled,json=fullDuplexStreamingEnabled,proto3" json:"full_duplex_streaming_enabled,omitempty"` +} + +func (x *Gateway) Reset() { + *x = Gateway{} + if protoimpl.UnsafeEnabled { + mi := &file_benchmark_proto_msgTypes[10] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *Gateway) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*Gateway) ProtoMessage() {} + +func (x *Gateway) ProtoReflect() protoreflect.Message { + mi := &file_benchmark_proto_msgTypes[10] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use Gateway.ProtoReflect.Descriptor instead. +func (*Gateway) Descriptor() ([]byte, []int) { + return file_benchmark_proto_rawDescGZIP(), []int{10} +} + +func (m *Gateway) GetType() isGateway_Type { + if m != nil { + return m.Type + } + return nil +} + +func (x *Gateway) GetEnvoy() *Envoy { + if x, ok := x.GetType().(*Gateway_Envoy); ok { + return x.Envoy + } + return nil +} + +func (x *Gateway) GetGkeGateway() *GKEGateway { + if x, ok := x.GetType().(*Gateway_GkeGateway); ok { + return x.GkeGateway + } + return nil +} + +func (x *Gateway) GetFullDuplexStreamingEnabled() bool { + if x != nil { + return x.FullDuplexStreamingEnabled + } + return false +} + +type isGateway_Type interface { + isGateway_Type() +} + +type Gateway_Envoy struct { + Envoy *Envoy `protobuf:"bytes,1,opt,name=envoy,proto3,oneof"` +} + +type Gateway_GkeGateway struct { + GkeGateway *GKEGateway `protobuf:"bytes,2,opt,name=gke_gateway,json=gkeGateway,proto3,oneof"` +} + +func (*Gateway_Envoy) isGateway_Type() {} + +func (*Gateway_GkeGateway) isGateway_Type() {} + +type Envoy struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + // Types that are assignable to Type: + // + // *Envoy_Epp + // *Envoy_LbPolicy + Type isEnvoy_Type `protobuf_oneof:"type"` +} + +func (x *Envoy) Reset() { + *x = Envoy{} + if protoimpl.UnsafeEnabled { + mi := &file_benchmark_proto_msgTypes[11] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *Envoy) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*Envoy) ProtoMessage() {} + +func (x *Envoy) ProtoReflect() protoreflect.Message { + mi := &file_benchmark_proto_msgTypes[11] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use Envoy.ProtoReflect.Descriptor instead. +func (*Envoy) Descriptor() ([]byte, []int) { + return file_benchmark_proto_rawDescGZIP(), []int{11} +} + +func (m *Envoy) GetType() isEnvoy_Type { + if m != nil { + return m.Type + } + return nil +} + +func (x *Envoy) GetEpp() *EPP { + if x, ok := x.GetType().(*Envoy_Epp); ok { + return x.Epp + } + return nil +} + +func (x *Envoy) GetLbPolicy() string { + if x, ok := x.GetType().(*Envoy_LbPolicy); ok { + return x.LbPolicy + } + return "" +} + +type isEnvoy_Type interface { + isEnvoy_Type() +} + +type Envoy_Epp struct { + Epp *EPP `protobuf:"bytes,1,opt,name=epp,proto3,oneof"` +} + +type Envoy_LbPolicy struct { + // Load balancing policies supported by Envoy: https://gateway.envoyproxy.io/docs/tasks/traffic/load-balancing/ + LbPolicy string `protobuf:"bytes,2,opt,name=lb_policy,json=lbPolicy,proto3,oneof"` +} + +func (*Envoy_Epp) isEnvoy_Type() {} + +func (*Envoy_LbPolicy) isEnvoy_Type() {} + +type GKEGateway struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + // Required. + Epp *EPP `protobuf:"bytes,1,opt,name=epp,proto3" json:"epp,omitempty"` +} + +func (x *GKEGateway) Reset() { + *x = GKEGateway{} + if protoimpl.UnsafeEnabled { + mi := &file_benchmark_proto_msgTypes[12] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *GKEGateway) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*GKEGateway) ProtoMessage() {} + +func (x *GKEGateway) ProtoReflect() protoreflect.Message { + mi := &file_benchmark_proto_msgTypes[12] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use GKEGateway.ProtoReflect.Descriptor instead. +func (*GKEGateway) Descriptor() ([]byte, []int) { + return file_benchmark_proto_rawDescGZIP(), []int{12} +} + +func (x *GKEGateway) GetEpp() *EPP { + if x != nil { + return x.Epp + } + return nil +} + +type EPP struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + // Optional. Default: "us-central1-docker.pkg.dev/k8s-staging-images/llm-instance-gateway/epp:main" + Image string `protobuf:"bytes,1,opt,name=image,proto3" json:"image,omitempty"` + // Optional. Default 50ms + RefreshMetricsInterval string `protobuf:"bytes,2,opt,name=refresh_metrics_interval,json=refreshMetricsInterval,proto3" json:"refresh_metrics_interval,omitempty"` +} + +func (x *EPP) Reset() { + *x = EPP{} + if protoimpl.UnsafeEnabled { + mi := &file_benchmark_proto_msgTypes[13] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *EPP) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*EPP) ProtoMessage() {} + +func (x *EPP) ProtoReflect() protoreflect.Message { + mi := &file_benchmark_proto_msgTypes[13] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use EPP.ProtoReflect.Descriptor instead. +func (*EPP) Descriptor() ([]byte, []int) { + return file_benchmark_proto_rawDescGZIP(), []int{13} +} + +func (x *EPP) GetImage() string { + if x != nil { + return x.Image + } + return "" +} + +func (x *EPP) GetRefreshMetricsInterval() string { + if x != nil { + return x.RefreshMetricsInterval + } + return "" +} + +// By default the service is in the same namespace as the model server and lpg. +// The service name is `model-server-service` and port is `8081`. +type K8SService struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields +} + +func (x *K8SService) Reset() { + *x = K8SService{} + if protoimpl.UnsafeEnabled { + mi := &file_benchmark_proto_msgTypes[14] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *K8SService) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*K8SService) ProtoMessage() {} + +func (x *K8SService) ProtoReflect() protoreflect.Message { + mi := &file_benchmark_proto_msgTypes[14] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use K8SService.ProtoReflect.Descriptor instead. +func (*K8SService) Descriptor() ([]byte, []int) { + return file_benchmark_proto_rawDescGZIP(), []int{14} +} + +type BenchmarkResult struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + // Optional. + Stats []*Stat `protobuf:"bytes,1,rep,name=stats,proto3" json:"stats,omitempty"` +} + +func (x *BenchmarkResult) Reset() { + *x = BenchmarkResult{} + if protoimpl.UnsafeEnabled { + mi := &file_benchmark_proto_msgTypes[15] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *BenchmarkResult) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*BenchmarkResult) ProtoMessage() {} + +func (x *BenchmarkResult) ProtoReflect() protoreflect.Message { + mi := &file_benchmark_proto_msgTypes[15] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use BenchmarkResult.ProtoReflect.Descriptor instead. +func (*BenchmarkResult) Descriptor() ([]byte, []int) { + return file_benchmark_proto_rawDescGZIP(), []int{15} +} + +func (x *BenchmarkResult) GetStats() []*Stat { + if x != nil { + return x.Stats + } + return nil +} + +type Stat struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + // Optional. + RequestRate float32 `protobuf:"fixed32,1,opt,name=request_rate,json=requestRate,proto3" json:"request_rate,omitempty"` + // Optional. + RequestLatency *Metric `protobuf:"bytes,2,opt,name=request_latency,json=requestLatency,proto3" json:"request_latency,omitempty"` + // Optional. + Throughput *Metric `protobuf:"bytes,3,opt,name=throughput,proto3" json:"throughput,omitempty"` + // Optional. + InputLength *Metric `protobuf:"bytes,4,opt,name=input_length,json=inputLength,proto3" json:"input_length,omitempty"` + // Optional. + OutputLength *Metric `protobuf:"bytes,5,opt,name=output_length,json=outputLength,proto3" json:"output_length,omitempty"` + // Optional. + Ttft *Metric `protobuf:"bytes,6,opt,name=ttft,proto3" json:"ttft,omitempty"` + // Optional. + Tpot *Metric `protobuf:"bytes,7,opt,name=tpot,proto3" json:"tpot,omitempty"` + // Optional. + ModelServerMetrics []*Metric `protobuf:"bytes,8,rep,name=model_server_metrics,json=modelServerMetrics,proto3" json:"model_server_metrics,omitempty"` +} + +func (x *Stat) Reset() { + *x = Stat{} + if protoimpl.UnsafeEnabled { + mi := &file_benchmark_proto_msgTypes[16] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *Stat) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*Stat) ProtoMessage() {} + +func (x *Stat) ProtoReflect() protoreflect.Message { + mi := &file_benchmark_proto_msgTypes[16] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use Stat.ProtoReflect.Descriptor instead. +func (*Stat) Descriptor() ([]byte, []int) { + return file_benchmark_proto_rawDescGZIP(), []int{16} +} + +func (x *Stat) GetRequestRate() float32 { + if x != nil { + return x.RequestRate + } + return 0 +} + +func (x *Stat) GetRequestLatency() *Metric { + if x != nil { + return x.RequestLatency + } + return nil +} + +func (x *Stat) GetThroughput() *Metric { + if x != nil { + return x.Throughput + } + return nil +} + +func (x *Stat) GetInputLength() *Metric { + if x != nil { + return x.InputLength + } + return nil +} + +func (x *Stat) GetOutputLength() *Metric { + if x != nil { + return x.OutputLength + } + return nil +} + +func (x *Stat) GetTtft() *Metric { + if x != nil { + return x.Ttft + } + return nil +} + +func (x *Stat) GetTpot() *Metric { + if x != nil { + return x.Tpot + } + return nil +} + +func (x *Stat) GetModelServerMetrics() []*Metric { + if x != nil { + return x.ModelServerMetrics + } + return nil +} + +type Metric struct { + state protoimpl.MessageState + sizeCache protoimpl.SizeCache + unknownFields protoimpl.UnknownFields + + // Optional. + Name string `protobuf:"bytes,1,opt,name=name,proto3" json:"name,omitempty"` + // Optional. + Mean float32 `protobuf:"fixed32,2,opt,name=mean,proto3" json:"mean,omitempty"` + // Optional. + Median float32 `protobuf:"fixed32,3,opt,name=median,proto3" json:"median,omitempty"` + // Optional. + Sd float32 `protobuf:"fixed32,4,opt,name=sd,proto3" json:"sd,omitempty"` + // Optional. + Min float32 `protobuf:"fixed32,5,opt,name=min,proto3" json:"min,omitempty"` + // Optional. + Max float32 `protobuf:"fixed32,6,opt,name=max,proto3" json:"max,omitempty"` + // Optional. + P90 float32 `protobuf:"fixed32,7,opt,name=p90,proto3" json:"p90,omitempty"` + // Optional. + P99 float32 `protobuf:"fixed32,8,opt,name=p99,proto3" json:"p99,omitempty"` +} + +func (x *Metric) Reset() { + *x = Metric{} + if protoimpl.UnsafeEnabled { + mi := &file_benchmark_proto_msgTypes[17] + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + ms.StoreMessageInfo(mi) + } +} + +func (x *Metric) String() string { + return protoimpl.X.MessageStringOf(x) +} + +func (*Metric) ProtoMessage() {} + +func (x *Metric) ProtoReflect() protoreflect.Message { + mi := &file_benchmark_proto_msgTypes[17] + if protoimpl.UnsafeEnabled && x != nil { + ms := protoimpl.X.MessageStateOf(protoimpl.Pointer(x)) + if ms.LoadMessageInfo() == nil { + ms.StoreMessageInfo(mi) + } + return ms + } + return mi.MessageOf(x) +} + +// Deprecated: Use Metric.ProtoReflect.Descriptor instead. +func (*Metric) Descriptor() ([]byte, []int) { + return file_benchmark_proto_rawDescGZIP(), []int{17} +} + +func (x *Metric) GetName() string { + if x != nil { + return x.Name + } + return "" +} + +func (x *Metric) GetMean() float32 { + if x != nil { + return x.Mean + } + return 0 +} + +func (x *Metric) GetMedian() float32 { + if x != nil { + return x.Median + } + return 0 +} + +func (x *Metric) GetSd() float32 { + if x != nil { + return x.Sd + } + return 0 +} + +func (x *Metric) GetMin() float32 { + if x != nil { + return x.Min + } + return 0 +} + +func (x *Metric) GetMax() float32 { + if x != nil { + return x.Max + } + return 0 +} + +func (x *Metric) GetP90() float32 { + if x != nil { + return x.P90 + } + return 0 +} + +func (x *Metric) GetP99() float32 { + if x != nil { + return x.P99 + } + return 0 +} + +var File_benchmark_proto protoreflect.FileDescriptor + +var file_benchmark_proto_rawDesc = []byte{ + 0x0a, 0x0f, 0x62, 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x2e, 0x70, 0x72, 0x6f, 0x74, + 0x6f, 0x12, 0x0f, 0x62, 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x2e, 0x70, 0x72, 0x6f, + 0x74, 0x6f, 0x1a, 0x1f, 0x67, 0x6f, 0x6f, 0x67, 0x6c, 0x65, 0x2f, 0x70, 0x72, 0x6f, 0x74, 0x6f, + 0x62, 0x75, 0x66, 0x2f, 0x74, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, 0x6d, 0x70, 0x2e, 0x70, 0x72, + 0x6f, 0x74, 0x6f, 0x22, 0x3a, 0x0a, 0x04, 0x48, 0x65, 0x6c, 0x6d, 0x12, 0x32, 0x0a, 0x06, 0x67, + 0x6c, 0x6f, 0x62, 0x61, 0x6c, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x1a, 0x2e, 0x62, 0x65, + 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x2e, 0x42, 0x65, + 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x52, 0x06, 0x67, 0x6c, 0x6f, 0x62, 0x61, 0x6c, 0x22, + 0x48, 0x0a, 0x0a, 0x42, 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x73, 0x12, 0x3a, 0x0a, + 0x0a, 0x62, 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x73, 0x18, 0x01, 0x20, 0x03, 0x28, + 0x0b, 0x32, 0x1a, 0x2e, 0x62, 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x2e, 0x70, 0x72, + 0x6f, 0x74, 0x6f, 0x2e, 0x42, 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x52, 0x0a, 0x62, + 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x73, 0x22, 0xc5, 0x02, 0x0a, 0x09, 0x42, 0x65, + 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x12, 0x45, 0x0a, 0x0e, 0x62, 0x65, 0x6e, 0x63, 0x68, + 0x6d, 0x61, 0x72, 0x6b, 0x5f, 0x63, 0x61, 0x73, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, + 0x1e, 0x2e, 0x62, 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x2e, 0x70, 0x72, 0x6f, 0x74, + 0x6f, 0x2e, 0x42, 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x43, 0x61, 0x73, 0x65, 0x52, + 0x0d, 0x62, 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x43, 0x61, 0x73, 0x65, 0x12, 0x38, + 0x0a, 0x06, 0x63, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x20, + 0x2e, 0x62, 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, + 0x2e, 0x42, 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x43, 0x6f, 0x6e, 0x66, 0x69, 0x67, + 0x52, 0x06, 0x63, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x12, 0x38, 0x0a, 0x06, 0x72, 0x65, 0x73, 0x75, + 0x6c, 0x74, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x20, 0x2e, 0x62, 0x65, 0x6e, 0x63, 0x68, + 0x6d, 0x61, 0x72, 0x6b, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x2e, 0x42, 0x65, 0x6e, 0x63, 0x68, + 0x6d, 0x61, 0x72, 0x6b, 0x52, 0x65, 0x73, 0x75, 0x6c, 0x74, 0x52, 0x06, 0x72, 0x65, 0x73, 0x75, + 0x6c, 0x74, 0x12, 0x39, 0x0a, 0x0a, 0x73, 0x74, 0x61, 0x72, 0x74, 0x5f, 0x74, 0x69, 0x6d, 0x65, + 0x18, 0x04, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x1a, 0x2e, 0x67, 0x6f, 0x6f, 0x67, 0x6c, 0x65, 0x2e, + 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x62, 0x75, 0x66, 0x2e, 0x54, 0x69, 0x6d, 0x65, 0x73, 0x74, 0x61, + 0x6d, 0x70, 0x52, 0x09, 0x73, 0x74, 0x61, 0x72, 0x74, 0x54, 0x69, 0x6d, 0x65, 0x12, 0x12, 0x0a, + 0x04, 0x6e, 0x61, 0x6d, 0x65, 0x18, 0x05, 0x20, 0x01, 0x28, 0x09, 0x52, 0x04, 0x6e, 0x61, 0x6d, + 0x65, 0x12, 0x2e, 0x0a, 0x13, 0x62, 0x61, 0x73, 0x65, 0x5f, 0x62, 0x65, 0x6e, 0x63, 0x68, 0x6d, + 0x61, 0x72, 0x6b, 0x5f, 0x6e, 0x61, 0x6d, 0x65, 0x18, 0x06, 0x20, 0x01, 0x28, 0x09, 0x52, 0x11, + 0x62, 0x61, 0x73, 0x65, 0x42, 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x4e, 0x61, 0x6d, + 0x65, 0x22, 0x45, 0x0a, 0x0d, 0x42, 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x43, 0x61, + 0x73, 0x65, 0x12, 0x12, 0x0a, 0x04, 0x6e, 0x61, 0x6d, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, + 0x52, 0x04, 0x6e, 0x61, 0x6d, 0x65, 0x12, 0x20, 0x0a, 0x0b, 0x64, 0x65, 0x73, 0x63, 0x72, 0x69, + 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0b, 0x64, 0x65, 0x73, + 0x63, 0x72, 0x69, 0x70, 0x74, 0x69, 0x6f, 0x6e, 0x22, 0xfb, 0x01, 0x0a, 0x0f, 0x42, 0x65, 0x6e, + 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x43, 0x6f, 0x6e, 0x66, 0x69, 0x67, 0x12, 0x42, 0x0a, 0x0d, + 0x6c, 0x6f, 0x61, 0x64, 0x5f, 0x62, 0x61, 0x6c, 0x61, 0x6e, 0x63, 0x65, 0x72, 0x18, 0x01, 0x20, + 0x01, 0x28, 0x0b, 0x32, 0x1d, 0x2e, 0x62, 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x2e, + 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x2e, 0x4c, 0x6f, 0x61, 0x64, 0x42, 0x61, 0x6c, 0x61, 0x6e, 0x63, + 0x65, 0x72, 0x52, 0x0c, 0x6c, 0x6f, 0x61, 0x64, 0x42, 0x61, 0x6c, 0x61, 0x6e, 0x63, 0x65, 0x72, + 0x12, 0x3f, 0x0a, 0x0c, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x5f, 0x73, 0x65, 0x72, 0x76, 0x65, 0x72, + 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x1c, 0x2e, 0x62, 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, + 0x72, 0x6b, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x2e, 0x4d, 0x6f, 0x64, 0x65, 0x6c, 0x53, 0x65, + 0x72, 0x76, 0x65, 0x72, 0x52, 0x0b, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x53, 0x65, 0x72, 0x76, 0x65, + 0x72, 0x12, 0x45, 0x0a, 0x0e, 0x62, 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x5f, 0x74, + 0x6f, 0x6f, 0x6c, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x1e, 0x2e, 0x62, 0x65, 0x6e, 0x63, + 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x2e, 0x42, 0x65, 0x6e, 0x63, + 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x54, 0x6f, 0x6f, 0x6c, 0x52, 0x0d, 0x62, 0x65, 0x6e, 0x63, 0x68, + 0x6d, 0x61, 0x72, 0x6b, 0x54, 0x6f, 0x6f, 0x6c, 0x12, 0x1c, 0x0a, 0x09, 0x6e, 0x61, 0x6d, 0x65, + 0x73, 0x70, 0x61, 0x63, 0x65, 0x18, 0x04, 0x20, 0x01, 0x28, 0x09, 0x52, 0x09, 0x6e, 0x61, 0x6d, + 0x65, 0x73, 0x70, 0x61, 0x63, 0x65, 0x22, 0x96, 0x01, 0x0a, 0x0b, 0x4d, 0x6f, 0x64, 0x65, 0x6c, + 0x53, 0x65, 0x72, 0x76, 0x65, 0x72, 0x12, 0x14, 0x0a, 0x05, 0x69, 0x6d, 0x61, 0x67, 0x65, 0x18, + 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x69, 0x6d, 0x61, 0x67, 0x65, 0x12, 0x20, 0x0a, 0x0b, + 0x61, 0x63, 0x63, 0x65, 0x6c, 0x65, 0x72, 0x61, 0x74, 0x6f, 0x72, 0x18, 0x02, 0x20, 0x01, 0x28, + 0x09, 0x52, 0x0b, 0x61, 0x63, 0x63, 0x65, 0x6c, 0x65, 0x72, 0x61, 0x74, 0x6f, 0x72, 0x12, 0x1a, + 0x0a, 0x08, 0x72, 0x65, 0x70, 0x6c, 0x69, 0x63, 0x61, 0x73, 0x18, 0x03, 0x20, 0x01, 0x28, 0x05, + 0x52, 0x08, 0x72, 0x65, 0x70, 0x6c, 0x69, 0x63, 0x61, 0x73, 0x12, 0x2b, 0x0a, 0x04, 0x76, 0x6c, + 0x6c, 0x6d, 0x18, 0x04, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x15, 0x2e, 0x62, 0x65, 0x6e, 0x63, 0x68, + 0x6d, 0x61, 0x72, 0x6b, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x2e, 0x56, 0x4c, 0x4c, 0x4d, 0x48, + 0x00, 0x52, 0x04, 0x76, 0x6c, 0x6c, 0x6d, 0x42, 0x06, 0x0a, 0x04, 0x74, 0x79, 0x70, 0x65, 0x22, + 0x95, 0x01, 0x0a, 0x04, 0x56, 0x4c, 0x4c, 0x4d, 0x12, 0x2d, 0x0a, 0x12, 0x74, 0x65, 0x6e, 0x73, + 0x6f, 0x72, 0x5f, 0x70, 0x61, 0x72, 0x61, 0x6c, 0x6c, 0x65, 0x6c, 0x69, 0x73, 0x6d, 0x18, 0x01, + 0x20, 0x01, 0x28, 0x09, 0x52, 0x11, 0x74, 0x65, 0x6e, 0x73, 0x6f, 0x72, 0x50, 0x61, 0x72, 0x61, + 0x6c, 0x6c, 0x65, 0x6c, 0x69, 0x73, 0x6d, 0x12, 0x1b, 0x0a, 0x09, 0x6d, 0x61, 0x78, 0x5f, 0x6c, + 0x6f, 0x72, 0x61, 0x73, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x08, 0x6d, 0x61, 0x78, 0x4c, + 0x6f, 0x72, 0x61, 0x73, 0x12, 0x14, 0x0a, 0x05, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x18, 0x03, 0x20, + 0x01, 0x28, 0x09, 0x52, 0x05, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x12, 0x1b, 0x0a, 0x09, 0x6c, 0x6f, + 0x72, 0x61, 0x5f, 0x72, 0x61, 0x6e, 0x6b, 0x18, 0x04, 0x20, 0x01, 0x28, 0x09, 0x52, 0x08, 0x6c, + 0x6f, 0x72, 0x61, 0x52, 0x61, 0x6e, 0x6b, 0x12, 0x0e, 0x0a, 0x02, 0x76, 0x31, 0x18, 0x05, 0x20, + 0x01, 0x28, 0x09, 0x52, 0x02, 0x76, 0x31, 0x22, 0x57, 0x0a, 0x0d, 0x42, 0x65, 0x6e, 0x63, 0x68, + 0x6d, 0x61, 0x72, 0x6b, 0x54, 0x6f, 0x6f, 0x6c, 0x12, 0x14, 0x0a, 0x05, 0x69, 0x6d, 0x61, 0x67, + 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x69, 0x6d, 0x61, 0x67, 0x65, 0x12, 0x28, + 0x0a, 0x03, 0x6c, 0x70, 0x67, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x14, 0x2e, 0x62, 0x65, + 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x2e, 0x4c, 0x50, + 0x47, 0x48, 0x00, 0x52, 0x03, 0x6c, 0x70, 0x67, 0x42, 0x06, 0x0a, 0x04, 0x54, 0x79, 0x70, 0x65, + 0x22, 0xa0, 0x02, 0x0a, 0x03, 0x4c, 0x50, 0x47, 0x12, 0x18, 0x0a, 0x07, 0x64, 0x61, 0x74, 0x61, + 0x73, 0x65, 0x74, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x07, 0x64, 0x61, 0x74, 0x61, 0x73, + 0x65, 0x74, 0x12, 0x16, 0x0a, 0x06, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x73, 0x18, 0x03, 0x20, 0x01, + 0x28, 0x09, 0x52, 0x06, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x73, 0x12, 0x0e, 0x0a, 0x02, 0x69, 0x70, + 0x18, 0x04, 0x20, 0x01, 0x28, 0x09, 0x52, 0x02, 0x69, 0x70, 0x12, 0x12, 0x0a, 0x04, 0x70, 0x6f, + 0x72, 0x74, 0x18, 0x05, 0x20, 0x01, 0x28, 0x09, 0x52, 0x04, 0x70, 0x6f, 0x72, 0x74, 0x12, 0x23, + 0x0a, 0x0d, 0x72, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x5f, 0x72, 0x61, 0x74, 0x65, 0x73, 0x18, + 0x06, 0x20, 0x01, 0x28, 0x09, 0x52, 0x0c, 0x72, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x52, 0x61, + 0x74, 0x65, 0x73, 0x12, 0x34, 0x0a, 0x16, 0x62, 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, + 0x5f, 0x74, 0x69, 0x6d, 0x65, 0x5f, 0x73, 0x65, 0x63, 0x6f, 0x6e, 0x64, 0x73, 0x18, 0x07, 0x20, + 0x01, 0x28, 0x09, 0x52, 0x14, 0x62, 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x54, 0x69, + 0x6d, 0x65, 0x53, 0x65, 0x63, 0x6f, 0x6e, 0x64, 0x73, 0x12, 0x23, 0x0a, 0x0d, 0x6f, 0x75, 0x74, + 0x70, 0x75, 0x74, 0x5f, 0x6c, 0x65, 0x6e, 0x67, 0x74, 0x68, 0x18, 0x08, 0x20, 0x01, 0x28, 0x09, + 0x52, 0x0c, 0x6f, 0x75, 0x74, 0x70, 0x75, 0x74, 0x4c, 0x65, 0x6e, 0x67, 0x74, 0x68, 0x12, 0x1c, + 0x0a, 0x09, 0x74, 0x6f, 0x6b, 0x65, 0x6e, 0x69, 0x7a, 0x65, 0x72, 0x18, 0x09, 0x20, 0x01, 0x28, + 0x09, 0x52, 0x09, 0x74, 0x6f, 0x6b, 0x65, 0x6e, 0x69, 0x7a, 0x65, 0x72, 0x12, 0x25, 0x0a, 0x0e, + 0x77, 0x61, 0x72, 0x6d, 0x75, 0x70, 0x5f, 0x73, 0x65, 0x63, 0x6f, 0x6e, 0x64, 0x73, 0x18, 0x0a, + 0x20, 0x01, 0x28, 0x09, 0x52, 0x0d, 0x77, 0x61, 0x72, 0x6d, 0x75, 0x70, 0x53, 0x65, 0x63, 0x6f, + 0x6e, 0x64, 0x73, 0x22, 0xd9, 0x03, 0x0a, 0x0c, 0x4c, 0x6f, 0x61, 0x64, 0x42, 0x61, 0x6c, 0x61, + 0x6e, 0x63, 0x65, 0x72, 0x12, 0x3e, 0x0a, 0x0b, 0x6b, 0x38, 0x73, 0x5f, 0x73, 0x65, 0x72, 0x76, + 0x69, 0x63, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x1b, 0x2e, 0x62, 0x65, 0x6e, 0x63, + 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x2e, 0x4b, 0x38, 0x73, 0x53, + 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, 0x48, 0x00, 0x52, 0x0a, 0x6b, 0x38, 0x73, 0x53, 0x65, 0x72, + 0x76, 0x69, 0x63, 0x65, 0x12, 0x34, 0x0a, 0x07, 0x67, 0x61, 0x74, 0x65, 0x77, 0x61, 0x79, 0x18, + 0x02, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x18, 0x2e, 0x62, 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, + 0x6b, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x2e, 0x47, 0x61, 0x74, 0x65, 0x77, 0x61, 0x79, 0x48, + 0x00, 0x52, 0x07, 0x67, 0x61, 0x74, 0x65, 0x77, 0x61, 0x79, 0x12, 0x2e, 0x0a, 0x13, 0x6b, 0x38, + 0x73, 0x5f, 0x73, 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, 0x5f, 0x65, 0x6e, 0x61, 0x62, 0x6c, 0x65, + 0x64, 0x18, 0x04, 0x20, 0x01, 0x28, 0x08, 0x52, 0x11, 0x6b, 0x38, 0x73, 0x53, 0x65, 0x72, 0x76, + 0x69, 0x63, 0x65, 0x45, 0x6e, 0x61, 0x62, 0x6c, 0x65, 0x64, 0x12, 0x27, 0x0a, 0x0f, 0x67, 0x61, + 0x74, 0x65, 0x77, 0x61, 0x79, 0x5f, 0x65, 0x6e, 0x61, 0x62, 0x6c, 0x65, 0x64, 0x18, 0x05, 0x20, + 0x01, 0x28, 0x08, 0x52, 0x0e, 0x67, 0x61, 0x74, 0x65, 0x77, 0x61, 0x79, 0x45, 0x6e, 0x61, 0x62, + 0x6c, 0x65, 0x64, 0x12, 0x32, 0x0a, 0x15, 0x67, 0x61, 0x74, 0x65, 0x77, 0x61, 0x79, 0x5f, 0x65, + 0x6e, 0x76, 0x6f, 0x79, 0x5f, 0x65, 0x6e, 0x61, 0x62, 0x6c, 0x65, 0x64, 0x18, 0x06, 0x20, 0x01, + 0x28, 0x08, 0x52, 0x13, 0x67, 0x61, 0x74, 0x65, 0x77, 0x61, 0x79, 0x45, 0x6e, 0x76, 0x6f, 0x79, + 0x45, 0x6e, 0x61, 0x62, 0x6c, 0x65, 0x64, 0x12, 0x3d, 0x0a, 0x1b, 0x67, 0x61, 0x74, 0x65, 0x77, + 0x61, 0x79, 0x5f, 0x67, 0x6b, 0x65, 0x5f, 0x67, 0x61, 0x74, 0x65, 0x77, 0x61, 0x79, 0x5f, 0x65, + 0x6e, 0x61, 0x62, 0x6c, 0x65, 0x64, 0x18, 0x07, 0x20, 0x01, 0x28, 0x08, 0x52, 0x18, 0x67, 0x61, + 0x74, 0x65, 0x77, 0x61, 0x79, 0x47, 0x6b, 0x65, 0x47, 0x61, 0x74, 0x65, 0x77, 0x61, 0x79, 0x45, + 0x6e, 0x61, 0x62, 0x6c, 0x65, 0x64, 0x12, 0x39, 0x0a, 0x19, 0x67, 0x61, 0x74, 0x65, 0x77, 0x61, + 0x79, 0x5f, 0x65, 0x6e, 0x76, 0x6f, 0x79, 0x5f, 0x65, 0x70, 0x70, 0x5f, 0x65, 0x6e, 0x61, 0x62, + 0x6c, 0x65, 0x64, 0x18, 0x08, 0x20, 0x01, 0x28, 0x08, 0x52, 0x16, 0x67, 0x61, 0x74, 0x65, 0x77, + 0x61, 0x79, 0x45, 0x6e, 0x76, 0x6f, 0x79, 0x45, 0x70, 0x70, 0x45, 0x6e, 0x61, 0x62, 0x6c, 0x65, + 0x64, 0x12, 0x44, 0x0a, 0x1f, 0x67, 0x61, 0x74, 0x65, 0x77, 0x61, 0x79, 0x5f, 0x65, 0x6e, 0x76, + 0x6f, 0x79, 0x5f, 0x6c, 0x62, 0x5f, 0x70, 0x6f, 0x6c, 0x69, 0x63, 0x79, 0x5f, 0x65, 0x6e, 0x61, + 0x62, 0x6c, 0x65, 0x64, 0x18, 0x09, 0x20, 0x01, 0x28, 0x08, 0x52, 0x1b, 0x67, 0x61, 0x74, 0x65, + 0x77, 0x61, 0x79, 0x45, 0x6e, 0x76, 0x6f, 0x79, 0x4c, 0x62, 0x50, 0x6f, 0x6c, 0x69, 0x63, 0x79, + 0x45, 0x6e, 0x61, 0x62, 0x6c, 0x65, 0x64, 0x42, 0x06, 0x0a, 0x04, 0x74, 0x79, 0x70, 0x65, 0x22, + 0xc4, 0x01, 0x0a, 0x07, 0x47, 0x61, 0x74, 0x65, 0x77, 0x61, 0x79, 0x12, 0x2e, 0x0a, 0x05, 0x65, + 0x6e, 0x76, 0x6f, 0x79, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x16, 0x2e, 0x62, 0x65, 0x6e, + 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x2e, 0x45, 0x6e, 0x76, + 0x6f, 0x79, 0x48, 0x00, 0x52, 0x05, 0x65, 0x6e, 0x76, 0x6f, 0x79, 0x12, 0x3e, 0x0a, 0x0b, 0x67, + 0x6b, 0x65, 0x5f, 0x67, 0x61, 0x74, 0x65, 0x77, 0x61, 0x79, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, + 0x32, 0x1b, 0x2e, 0x62, 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x2e, 0x70, 0x72, 0x6f, + 0x74, 0x6f, 0x2e, 0x47, 0x4b, 0x45, 0x47, 0x61, 0x74, 0x65, 0x77, 0x61, 0x79, 0x48, 0x00, 0x52, + 0x0a, 0x67, 0x6b, 0x65, 0x47, 0x61, 0x74, 0x65, 0x77, 0x61, 0x79, 0x12, 0x41, 0x0a, 0x1d, 0x66, + 0x75, 0x6c, 0x6c, 0x5f, 0x64, 0x75, 0x70, 0x6c, 0x65, 0x78, 0x5f, 0x73, 0x74, 0x72, 0x65, 0x61, + 0x6d, 0x69, 0x6e, 0x67, 0x5f, 0x65, 0x6e, 0x61, 0x62, 0x6c, 0x65, 0x64, 0x18, 0x03, 0x20, 0x01, + 0x28, 0x08, 0x52, 0x1a, 0x66, 0x75, 0x6c, 0x6c, 0x44, 0x75, 0x70, 0x6c, 0x65, 0x78, 0x53, 0x74, + 0x72, 0x65, 0x61, 0x6d, 0x69, 0x6e, 0x67, 0x45, 0x6e, 0x61, 0x62, 0x6c, 0x65, 0x64, 0x42, 0x06, + 0x0a, 0x04, 0x74, 0x79, 0x70, 0x65, 0x22, 0x58, 0x0a, 0x05, 0x45, 0x6e, 0x76, 0x6f, 0x79, 0x12, + 0x28, 0x0a, 0x03, 0x65, 0x70, 0x70, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x14, 0x2e, 0x62, + 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x2e, 0x45, + 0x50, 0x50, 0x48, 0x00, 0x52, 0x03, 0x65, 0x70, 0x70, 0x12, 0x1d, 0x0a, 0x09, 0x6c, 0x62, 0x5f, + 0x70, 0x6f, 0x6c, 0x69, 0x63, 0x79, 0x18, 0x02, 0x20, 0x01, 0x28, 0x09, 0x48, 0x00, 0x52, 0x08, + 0x6c, 0x62, 0x50, 0x6f, 0x6c, 0x69, 0x63, 0x79, 0x42, 0x06, 0x0a, 0x04, 0x74, 0x79, 0x70, 0x65, + 0x22, 0x34, 0x0a, 0x0a, 0x47, 0x4b, 0x45, 0x47, 0x61, 0x74, 0x65, 0x77, 0x61, 0x79, 0x12, 0x26, + 0x0a, 0x03, 0x65, 0x70, 0x70, 0x18, 0x01, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x14, 0x2e, 0x62, 0x65, + 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x2e, 0x45, 0x50, + 0x50, 0x52, 0x03, 0x65, 0x70, 0x70, 0x22, 0x55, 0x0a, 0x03, 0x45, 0x50, 0x50, 0x12, 0x14, 0x0a, + 0x05, 0x69, 0x6d, 0x61, 0x67, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x05, 0x69, 0x6d, + 0x61, 0x67, 0x65, 0x12, 0x38, 0x0a, 0x18, 0x72, 0x65, 0x66, 0x72, 0x65, 0x73, 0x68, 0x5f, 0x6d, + 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x5f, 0x69, 0x6e, 0x74, 0x65, 0x72, 0x76, 0x61, 0x6c, 0x18, + 0x02, 0x20, 0x01, 0x28, 0x09, 0x52, 0x16, 0x72, 0x65, 0x66, 0x72, 0x65, 0x73, 0x68, 0x4d, 0x65, + 0x74, 0x72, 0x69, 0x63, 0x73, 0x49, 0x6e, 0x74, 0x65, 0x72, 0x76, 0x61, 0x6c, 0x22, 0x0c, 0x0a, + 0x0a, 0x4b, 0x38, 0x73, 0x53, 0x65, 0x72, 0x76, 0x69, 0x63, 0x65, 0x22, 0x3e, 0x0a, 0x0f, 0x42, + 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x52, 0x65, 0x73, 0x75, 0x6c, 0x74, 0x12, 0x2b, + 0x0a, 0x05, 0x73, 0x74, 0x61, 0x74, 0x73, 0x18, 0x01, 0x20, 0x03, 0x28, 0x0b, 0x32, 0x15, 0x2e, + 0x62, 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x2e, + 0x53, 0x74, 0x61, 0x74, 0x52, 0x05, 0x73, 0x74, 0x61, 0x74, 0x73, 0x22, 0xc3, 0x03, 0x0a, 0x04, + 0x53, 0x74, 0x61, 0x74, 0x12, 0x21, 0x0a, 0x0c, 0x72, 0x65, 0x71, 0x75, 0x65, 0x73, 0x74, 0x5f, + 0x72, 0x61, 0x74, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x02, 0x52, 0x0b, 0x72, 0x65, 0x71, 0x75, + 0x65, 0x73, 0x74, 0x52, 0x61, 0x74, 0x65, 0x12, 0x40, 0x0a, 0x0f, 0x72, 0x65, 0x71, 0x75, 0x65, + 0x73, 0x74, 0x5f, 0x6c, 0x61, 0x74, 0x65, 0x6e, 0x63, 0x79, 0x18, 0x02, 0x20, 0x01, 0x28, 0x0b, + 0x32, 0x17, 0x2e, 0x62, 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x2e, 0x70, 0x72, 0x6f, + 0x74, 0x6f, 0x2e, 0x4d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x52, 0x0e, 0x72, 0x65, 0x71, 0x75, 0x65, + 0x73, 0x74, 0x4c, 0x61, 0x74, 0x65, 0x6e, 0x63, 0x79, 0x12, 0x37, 0x0a, 0x0a, 0x74, 0x68, 0x72, + 0x6f, 0x75, 0x67, 0x68, 0x70, 0x75, 0x74, 0x18, 0x03, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x17, 0x2e, + 0x62, 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x2e, + 0x4d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x52, 0x0a, 0x74, 0x68, 0x72, 0x6f, 0x75, 0x67, 0x68, 0x70, + 0x75, 0x74, 0x12, 0x3a, 0x0a, 0x0c, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x5f, 0x6c, 0x65, 0x6e, 0x67, + 0x74, 0x68, 0x18, 0x04, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x17, 0x2e, 0x62, 0x65, 0x6e, 0x63, 0x68, + 0x6d, 0x61, 0x72, 0x6b, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x2e, 0x4d, 0x65, 0x74, 0x72, 0x69, + 0x63, 0x52, 0x0b, 0x69, 0x6e, 0x70, 0x75, 0x74, 0x4c, 0x65, 0x6e, 0x67, 0x74, 0x68, 0x12, 0x3c, + 0x0a, 0x0d, 0x6f, 0x75, 0x74, 0x70, 0x75, 0x74, 0x5f, 0x6c, 0x65, 0x6e, 0x67, 0x74, 0x68, 0x18, + 0x05, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x17, 0x2e, 0x62, 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, + 0x6b, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x2e, 0x4d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x52, 0x0c, + 0x6f, 0x75, 0x74, 0x70, 0x75, 0x74, 0x4c, 0x65, 0x6e, 0x67, 0x74, 0x68, 0x12, 0x2b, 0x0a, 0x04, + 0x74, 0x74, 0x66, 0x74, 0x18, 0x06, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x17, 0x2e, 0x62, 0x65, 0x6e, + 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x2e, 0x4d, 0x65, 0x74, + 0x72, 0x69, 0x63, 0x52, 0x04, 0x74, 0x74, 0x66, 0x74, 0x12, 0x2b, 0x0a, 0x04, 0x74, 0x70, 0x6f, + 0x74, 0x18, 0x07, 0x20, 0x01, 0x28, 0x0b, 0x32, 0x17, 0x2e, 0x62, 0x65, 0x6e, 0x63, 0x68, 0x6d, + 0x61, 0x72, 0x6b, 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x2e, 0x4d, 0x65, 0x74, 0x72, 0x69, 0x63, + 0x52, 0x04, 0x74, 0x70, 0x6f, 0x74, 0x12, 0x49, 0x0a, 0x14, 0x6d, 0x6f, 0x64, 0x65, 0x6c, 0x5f, + 0x73, 0x65, 0x72, 0x76, 0x65, 0x72, 0x5f, 0x6d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x73, 0x18, 0x08, + 0x20, 0x03, 0x28, 0x0b, 0x32, 0x17, 0x2e, 0x62, 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, 0x6b, + 0x2e, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x2e, 0x4d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x52, 0x12, 0x6d, + 0x6f, 0x64, 0x65, 0x6c, 0x53, 0x65, 0x72, 0x76, 0x65, 0x72, 0x4d, 0x65, 0x74, 0x72, 0x69, 0x63, + 0x73, 0x22, 0xa0, 0x01, 0x0a, 0x06, 0x4d, 0x65, 0x74, 0x72, 0x69, 0x63, 0x12, 0x12, 0x0a, 0x04, + 0x6e, 0x61, 0x6d, 0x65, 0x18, 0x01, 0x20, 0x01, 0x28, 0x09, 0x52, 0x04, 0x6e, 0x61, 0x6d, 0x65, + 0x12, 0x12, 0x0a, 0x04, 0x6d, 0x65, 0x61, 0x6e, 0x18, 0x02, 0x20, 0x01, 0x28, 0x02, 0x52, 0x04, + 0x6d, 0x65, 0x61, 0x6e, 0x12, 0x16, 0x0a, 0x06, 0x6d, 0x65, 0x64, 0x69, 0x61, 0x6e, 0x18, 0x03, + 0x20, 0x01, 0x28, 0x02, 0x52, 0x06, 0x6d, 0x65, 0x64, 0x69, 0x61, 0x6e, 0x12, 0x0e, 0x0a, 0x02, + 0x73, 0x64, 0x18, 0x04, 0x20, 0x01, 0x28, 0x02, 0x52, 0x02, 0x73, 0x64, 0x12, 0x10, 0x0a, 0x03, + 0x6d, 0x69, 0x6e, 0x18, 0x05, 0x20, 0x01, 0x28, 0x02, 0x52, 0x03, 0x6d, 0x69, 0x6e, 0x12, 0x10, + 0x0a, 0x03, 0x6d, 0x61, 0x78, 0x18, 0x06, 0x20, 0x01, 0x28, 0x02, 0x52, 0x03, 0x6d, 0x61, 0x78, + 0x12, 0x10, 0x0a, 0x03, 0x70, 0x39, 0x30, 0x18, 0x07, 0x20, 0x01, 0x28, 0x02, 0x52, 0x03, 0x70, + 0x39, 0x30, 0x12, 0x10, 0x0a, 0x03, 0x70, 0x39, 0x39, 0x18, 0x08, 0x20, 0x01, 0x28, 0x02, 0x52, + 0x03, 0x70, 0x39, 0x39, 0x42, 0x11, 0x5a, 0x0f, 0x62, 0x65, 0x6e, 0x63, 0x68, 0x6d, 0x61, 0x72, + 0x6b, 0x2f, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x62, 0x06, 0x70, 0x72, 0x6f, 0x74, 0x6f, 0x33, +} + +var ( + file_benchmark_proto_rawDescOnce sync.Once + file_benchmark_proto_rawDescData = file_benchmark_proto_rawDesc +) + +func file_benchmark_proto_rawDescGZIP() []byte { + file_benchmark_proto_rawDescOnce.Do(func() { + file_benchmark_proto_rawDescData = protoimpl.X.CompressGZIP(file_benchmark_proto_rawDescData) + }) + return file_benchmark_proto_rawDescData +} + +var file_benchmark_proto_msgTypes = make([]protoimpl.MessageInfo, 18) +var file_benchmark_proto_goTypes = []interface{}{ + (*Helm)(nil), // 0: benchmark.proto.Helm + (*Benchmarks)(nil), // 1: benchmark.proto.Benchmarks + (*Benchmark)(nil), // 2: benchmark.proto.Benchmark + (*BenchmarkCase)(nil), // 3: benchmark.proto.BenchmarkCase + (*BenchmarkConfig)(nil), // 4: benchmark.proto.BenchmarkConfig + (*ModelServer)(nil), // 5: benchmark.proto.ModelServer + (*VLLM)(nil), // 6: benchmark.proto.VLLM + (*BenchmarkTool)(nil), // 7: benchmark.proto.BenchmarkTool + (*LPG)(nil), // 8: benchmark.proto.LPG + (*LoadBalancer)(nil), // 9: benchmark.proto.LoadBalancer + (*Gateway)(nil), // 10: benchmark.proto.Gateway + (*Envoy)(nil), // 11: benchmark.proto.Envoy + (*GKEGateway)(nil), // 12: benchmark.proto.GKEGateway + (*EPP)(nil), // 13: benchmark.proto.EPP + (*K8SService)(nil), // 14: benchmark.proto.K8sService + (*BenchmarkResult)(nil), // 15: benchmark.proto.BenchmarkResult + (*Stat)(nil), // 16: benchmark.proto.Stat + (*Metric)(nil), // 17: benchmark.proto.Metric + (*timestamppb.Timestamp)(nil), // 18: google.protobuf.Timestamp +} +var file_benchmark_proto_depIdxs = []int32{ + 2, // 0: benchmark.proto.Helm.global:type_name -> benchmark.proto.Benchmark + 2, // 1: benchmark.proto.Benchmarks.benchmarks:type_name -> benchmark.proto.Benchmark + 3, // 2: benchmark.proto.Benchmark.benchmark_case:type_name -> benchmark.proto.BenchmarkCase + 4, // 3: benchmark.proto.Benchmark.config:type_name -> benchmark.proto.BenchmarkConfig + 15, // 4: benchmark.proto.Benchmark.result:type_name -> benchmark.proto.BenchmarkResult + 18, // 5: benchmark.proto.Benchmark.start_time:type_name -> google.protobuf.Timestamp + 9, // 6: benchmark.proto.BenchmarkConfig.load_balancer:type_name -> benchmark.proto.LoadBalancer + 5, // 7: benchmark.proto.BenchmarkConfig.model_server:type_name -> benchmark.proto.ModelServer + 7, // 8: benchmark.proto.BenchmarkConfig.benchmark_tool:type_name -> benchmark.proto.BenchmarkTool + 6, // 9: benchmark.proto.ModelServer.vllm:type_name -> benchmark.proto.VLLM + 8, // 10: benchmark.proto.BenchmarkTool.lpg:type_name -> benchmark.proto.LPG + 14, // 11: benchmark.proto.LoadBalancer.k8s_service:type_name -> benchmark.proto.K8sService + 10, // 12: benchmark.proto.LoadBalancer.gateway:type_name -> benchmark.proto.Gateway + 11, // 13: benchmark.proto.Gateway.envoy:type_name -> benchmark.proto.Envoy + 12, // 14: benchmark.proto.Gateway.gke_gateway:type_name -> benchmark.proto.GKEGateway + 13, // 15: benchmark.proto.Envoy.epp:type_name -> benchmark.proto.EPP + 13, // 16: benchmark.proto.GKEGateway.epp:type_name -> benchmark.proto.EPP + 16, // 17: benchmark.proto.BenchmarkResult.stats:type_name -> benchmark.proto.Stat + 17, // 18: benchmark.proto.Stat.request_latency:type_name -> benchmark.proto.Metric + 17, // 19: benchmark.proto.Stat.throughput:type_name -> benchmark.proto.Metric + 17, // 20: benchmark.proto.Stat.input_length:type_name -> benchmark.proto.Metric + 17, // 21: benchmark.proto.Stat.output_length:type_name -> benchmark.proto.Metric + 17, // 22: benchmark.proto.Stat.ttft:type_name -> benchmark.proto.Metric + 17, // 23: benchmark.proto.Stat.tpot:type_name -> benchmark.proto.Metric + 17, // 24: benchmark.proto.Stat.model_server_metrics:type_name -> benchmark.proto.Metric + 25, // [25:25] is the sub-list for method output_type + 25, // [25:25] is the sub-list for method input_type + 25, // [25:25] is the sub-list for extension type_name + 25, // [25:25] is the sub-list for extension extendee + 0, // [0:25] is the sub-list for field type_name +} + +func init() { file_benchmark_proto_init() } +func file_benchmark_proto_init() { + if File_benchmark_proto != nil { + return + } + if !protoimpl.UnsafeEnabled { + file_benchmark_proto_msgTypes[0].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*Helm); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_benchmark_proto_msgTypes[1].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*Benchmarks); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_benchmark_proto_msgTypes[2].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*Benchmark); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_benchmark_proto_msgTypes[3].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*BenchmarkCase); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_benchmark_proto_msgTypes[4].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*BenchmarkConfig); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_benchmark_proto_msgTypes[5].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*ModelServer); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_benchmark_proto_msgTypes[6].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*VLLM); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_benchmark_proto_msgTypes[7].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*BenchmarkTool); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_benchmark_proto_msgTypes[8].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*LPG); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_benchmark_proto_msgTypes[9].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*LoadBalancer); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_benchmark_proto_msgTypes[10].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*Gateway); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_benchmark_proto_msgTypes[11].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*Envoy); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_benchmark_proto_msgTypes[12].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*GKEGateway); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_benchmark_proto_msgTypes[13].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*EPP); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_benchmark_proto_msgTypes[14].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*K8SService); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_benchmark_proto_msgTypes[15].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*BenchmarkResult); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_benchmark_proto_msgTypes[16].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*Stat); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + file_benchmark_proto_msgTypes[17].Exporter = func(v interface{}, i int) interface{} { + switch v := v.(*Metric); i { + case 0: + return &v.state + case 1: + return &v.sizeCache + case 2: + return &v.unknownFields + default: + return nil + } + } + } + file_benchmark_proto_msgTypes[5].OneofWrappers = []interface{}{ + (*ModelServer_Vllm)(nil), + } + file_benchmark_proto_msgTypes[7].OneofWrappers = []interface{}{ + (*BenchmarkTool_Lpg)(nil), + } + file_benchmark_proto_msgTypes[9].OneofWrappers = []interface{}{ + (*LoadBalancer_K8SService)(nil), + (*LoadBalancer_Gateway)(nil), + } + file_benchmark_proto_msgTypes[10].OneofWrappers = []interface{}{ + (*Gateway_Envoy)(nil), + (*Gateway_GkeGateway)(nil), + } + file_benchmark_proto_msgTypes[11].OneofWrappers = []interface{}{ + (*Envoy_Epp)(nil), + (*Envoy_LbPolicy)(nil), + } + type x struct{} + out := protoimpl.TypeBuilder{ + File: protoimpl.DescBuilder{ + GoPackagePath: reflect.TypeOf(x{}).PkgPath(), + RawDescriptor: file_benchmark_proto_rawDesc, + NumEnums: 0, + NumMessages: 18, + NumExtensions: 0, + NumServices: 0, + }, + GoTypes: file_benchmark_proto_goTypes, + DependencyIndexes: file_benchmark_proto_depIdxs, + MessageInfos: file_benchmark_proto_msgTypes, + }.Build() + File_benchmark_proto = out.File + file_benchmark_proto_rawDesc = nil + file_benchmark_proto_goTypes = nil + file_benchmark_proto_depIdxs = nil +} diff --git a/tools/benchmark/proto/benchmark.proto b/tools/benchmark/proto/benchmark.proto new file mode 100644 index 000000000..673d80e19 --- /dev/null +++ b/tools/benchmark/proto/benchmark.proto @@ -0,0 +1,204 @@ +// Generate the proto: +// protoc --go_out=. --go_opt=paths=source_relative benchmark.proto + +syntax = "proto3"; +package benchmark.proto; + +import "google/protobuf/timestamp.proto"; + +option go_package = "benchmark/proto"; + +// A wrapper to hold the global benchmark configuration. This is used to generate the Helm chart values.yaml +message Helm { + Benchmark global = 1; +} + +message Benchmarks { + repeated Benchmark benchmarks = 1; +} + +// Benchmark captures the information of a benchmark run, and will be persisted to the DB for data +// analysis. +message Benchmark { + // Optional. + BenchmarkCase benchmark_case = 1; + // Required. User facing configuration to configure the benchmark manifests. + BenchmarkConfig config = 2; + // Optional. Result is automatically collected by the benchmark automation framework. + BenchmarkResult result = 3; + // Optional. Autogenerated by the tool. + google.protobuf.Timestamp start_time = 4; + // Optional. Name is used for matching the base_benchmark_name in the same Benchmarks config file. + string name = 5; + // Optional. The name of the parent benchmark configuration to base on. + string base_benchmark_name = 6; +} + +message BenchmarkCase { + // Required. + string name = 1; + // Optional. + string description = 2; +} + +// BenchmarkConfig is the main user facing configuration for the benchmark run. It is used to +// generate benchmark manifests. +message BenchmarkConfig { + // Required. Configuration about the load balancer. + LoadBalancer load_balancer = 1; + // Required. Configuration about the model server deployment. + ModelServer model_server = 2; + // Required. Configuration about the benchmark tooling. + BenchmarkTool benchmark_tool = 3; + // Optional. + string namespace = 4; +} + +message ModelServer { + // Optional. Default: "vllm/vllm-openai:latest" + string image = 1; + // Optional. Type of the accelerator, e.g, nvidia-tesla-a100, nvidia-l4, etc. + string accelerator = 2; + // Optional. Default: 3 + int32 replicas = 3; + oneof type { + VLLM vllm = 4; + } +} + +message VLLM { + // Optional. Default: "1" + string tensor_parallelism = 1; + // Optional. Default: "3" + string max_loras = 2; + // Optional. Default: "meta-llama/Llama-2-7b-hf" + string model = 3; + // Optional. Default: 16 + string lora_rank=4; + // Optional. Default: "0". + // If set to "1", the V1 model is used. + string v1 = 5; +} + +message BenchmarkTool { + string image = 1; + oneof Type { + LPG lpg = 2; + } +} + +message LPG { + // Optional. Which input dataset to use, default to + // ShareGPT_V3_unfiltered_cleaned_split. + // Default: "sharegpt_v3_unfiltered_cleaned_split" + string dataset = 2; + // Optional. Which models to use, default to meta-llama/Llama-2-7b-hf + // Default: "meta-llama/Llama-2-7b-hf" + string models = 3; + // Optional. Default: "model-server-service.benchmark-catalog.svc.cluster.local" + string ip = 4; + // Optional. Default: "8081" + string port = 5; + // Required. + string request_rates = 6; + // Optional. Default: "60" + string benchmark_time_seconds = 7; + // Optional. Default: "1024" + string output_length = 8; + string tokenizer = 9; + string warmup_seconds = 10; +} + +message LoadBalancer { + oneof type { + K8sService k8s_service = 1; + Gateway gateway = 2; + } + + // Define boolean flags to tell which type is enabled. This is due to a bug in Helm that you must + // explicitly set whether a field is enabled or not. A non-existing field is considered enabled + // due to the bug https://github.com/helm/helm/issues/10296. + bool k8s_service_enabled = 4; + bool gateway_enabled = 5; + bool gateway_envoy_enabled = 6; + bool gateway_gke_gateway_enabled = 7; + bool gateway_envoy_epp_enabled = 8; + bool gateway_envoy_lb_policy_enabled = 9; +} + +// By default the gateway name is `model-server-gateway` and port is `8081`. +message Gateway { + oneof type { + Envoy envoy = 1; + GKEGateway gke_gateway = 2; + } + bool full_duplex_streaming_enabled = 3; +} + +message Envoy { + oneof type { + EPP epp = 1; + // Load balancing policies supported by Envoy: https://gateway.envoyproxy.io/docs/tasks/traffic/load-balancing/ + string lb_policy = 2; + } +} +message GKEGateway{ + // Required. + EPP epp = 1; +} + +message EPP { + // Optional. Default: "us-central1-docker.pkg.dev/k8s-staging-images/llm-instance-gateway/epp:main" + string image = 1; + // Optional. Default 50ms + string refresh_metrics_interval = 2; +} + +// By default the service is in the same namespace as the model server and lpg. +// The service name is `model-server-service` and port is `8081`. +message K8sService { +} + +message BenchmarkResult { + // Optional. + repeated Stat stats = 1; +} + + +message Stat { + // Optional. + float request_rate = 1; + // Optional. + Metric request_latency = 2; + // Optional. + Metric throughput = 3; + // Optional. + Metric input_length = 4; + // Optional. + Metric output_length = 5; + // Optional. + Metric ttft = 6; + // Optional. + Metric tpot = 7; + // Optional. + repeated Metric model_server_metrics = 8; +} + +message Metric { + // Optional. + string name = 1; + // Optional. + float mean = 2; + // Optional. + float median = 3; + // Optional. + float sd = 4; + // Optional. + float min = 5; + // Optional. + float max = 6; + // Optional. + float p90 = 7; + // Optional. + float p99 = 8; +} \ No newline at end of file diff --git a/tools/benchmark/download-benchmark-results.bash b/tools/benchmark/scripts/download-benchmark-results.bash similarity index 100% rename from tools/benchmark/download-benchmark-results.bash rename to tools/benchmark/scripts/download-benchmark-results.bash diff --git a/tools/benchmark/scripts/env.sh b/tools/benchmark/scripts/env.sh new file mode 100755 index 000000000..5149f2a65 --- /dev/null +++ b/tools/benchmark/scripts/env.sh @@ -0,0 +1,3 @@ +BENCHMARK_PROJECT="provide this if you run on GCP" +CLUSTER_NAME="your cluster name" +LOCATION="location of your cluster" \ No newline at end of file diff --git a/tools/benchmark/scripts/generate_manifests.bash b/tools/benchmark/scripts/generate_manifests.bash new file mode 100755 index 000000000..0155dca7a --- /dev/null +++ b/tools/benchmark/scripts/generate_manifests.bash @@ -0,0 +1,34 @@ +#!/bin/bash + +main(){ + SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )" + source ${SCRIPT_DIR}/env.sh + + if [[ -z ${run_id} ]]; then + run_id=$(uuidgen | tr '[:upper:]' '[:lower:]' | cut -d'-' -f1) # Get the first part (8 hex characters) + else + echo "Generating manifests using run id ${run_id}" + fi + + # Generate benchmark manifests + # First we generate the ModelServer and LoadBalancer manifests. We will generate BenchmarkTool + # manifest after we deploy the ModelServer and LoadBalancer, because we need runtime information. + echo "Generating ModelServer and LoadBalancer manifests for benchmarks ${benchmarks}" + go run ${SCRIPT_DIR}/../manifestgenerator/main.go \ + --catalogDir="${SCRIPT_DIR}/../catalog/" \ + --outputRootDir="${SCRIPT_DIR}/../${output_dir}" \ + --benchmarks="${benchmarks}.pbtxt" \ + --manifestTypes="ModelServer,LoadBalancer" \ + --runID="${run_id}" \ + --override=${override} \ + --v=1 +} + +# Env vars to be passed when calling this script. +# Example usage: benchmarks="example" ./generate_manifests.bash +# benchmarks is the file name of a benchmark pbtxt file under catalog/benchmark +benchmarks=${benchmarks:-"example"} +run_id=${run_id:-""} +override=${override:-"false"} +output_dir=${output_dir:-'output'} +main \ No newline at end of file diff --git a/tools/benchmark/scripts/run_all_benchmarks.bash b/tools/benchmark/scripts/run_all_benchmarks.bash new file mode 100755 index 000000000..929162d57 --- /dev/null +++ b/tools/benchmark/scripts/run_all_benchmarks.bash @@ -0,0 +1,13 @@ +#!/bin/bash + +SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )" +source ${SCRIPT_DIR}/env.sh +regex=${regex:-".*.pbtxt"} + +while read -r file; do + filename=$(basename "$file") + # Extract the filename without extension using parameter expansion + filename_without_ext="${filename%.*}" + echo "Running benchmark for $filename_without_ext" + benchmarks=${filename_without_ext} ${SCRIPT_DIR}/run_benchmarks_file.bash +done < <(find "${SCRIPT_DIR}/../catalog/" -regex "$regex" -print | sort ) \ No newline at end of file diff --git a/tools/benchmark/scripts/run_benchmarks_file.bash b/tools/benchmark/scripts/run_benchmarks_file.bash new file mode 100755 index 000000000..450639d18 --- /dev/null +++ b/tools/benchmark/scripts/run_benchmarks_file.bash @@ -0,0 +1,39 @@ +#!/bin/bash + +# Env vars to be passed when calling this script. +# Example usage: benchmarks="example" benchmark_name_regex="c1.*" gcs_bucket="benchmark-inference-gateway" ./run_benchmark.bash +benchmarks=${benchmarks:-"example"} +dry_run=${dry_run:-"false"} +gcs_bucket=${gcs_bucket:-""} +# set skip_tear_down to true to preserve the env after benchmark. +skip_tear_down=${skip_tear_down:-"false"} +benchmark_name_regex=${benchmark_name_regex:-".*"} +output_dir=${output_dir:-'output'} + + +SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )" +source ${SCRIPT_DIR}/env.sh + +if [[ -z ${run_id} ]]; then + run_id=$(uuidgen | tr '[:upper:]' '[:lower:]'| cut -d'-' -f1) # Get the first part (8 hex characters) +else + echo "Using existing run id ${run_id}" +fi + +# Generate benchmark manifests +# First we generate the ModelServer and LoadBalancer manifests. We will generate BenchmarkTool +# manifest after we deploy the ModelServer and LoadBalancer, because we need runtime information. +echo "Generating ModelServer and LoadBalancer manifests for benchmarks ${benchmarks}" +run_id=${run_id} benchmarks=${benchmarks} ${SCRIPT_DIR}/generate_manifests.bash +benchmarks_output_dir=${SCRIPT_DIR}/../${output_dir}/${run_id} +if [[ "${dry_run}" == "true" ]]; then + echo "Dry-run=${dry_run}. Skipping deploying the benchmark. You can check the generated manifest at ${benchmarks_output_dir}" + return +fi + +echo "Run generated benchmark one by one" +while read -r folder; do + benchmark_output_dir=$(basename "$folder") + echo "Running benchmark for $benchmark_output_dir" + run_id=${run_id} benchmark_id=${benchmark_output_dir} ${SCRIPT_DIR}/run_one_benchmark.bash +done < <(find "${benchmarks_output_dir}/" -maxdepth 1 -mindepth 1 -type d -regex "$benchmark_name_regex" -print | sort ) diff --git a/tools/benchmark/scripts/run_one_benchmark.bash b/tools/benchmark/scripts/run_one_benchmark.bash new file mode 100755 index 000000000..c5c821c7f --- /dev/null +++ b/tools/benchmark/scripts/run_one_benchmark.bash @@ -0,0 +1,148 @@ +#!/bin/bash + +main(){ + SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )" + source ${SCRIPT_DIR}/env.sh + + if [[ -z ${run_id} ]]; then + run_id=$(uuidgen | tr '[:upper:]' '[:lower:]' | cut -d'-' -f1) # Get the first part (8 hex characters) + fi + echo "Using run id ${run_id}" + + namespace=${benchmark_id} + benchmark_output_dir=${SCRIPT_DIR}/../${output_dir}/${run_id}/${benchmark_id} + benchmark_file_path="${benchmark_output_dir}/benchmark.pbtxt" + + if [[ -d "${benchmark_output_dir}/results/json" ]]; then + echo "The JSON results ${benchmark_output_dir}/results/json already exists, skipping. If you want to re-run the benchmark, delete the results directory and try again." + echo "Attempting to tear down ${run_id}/${benchmark_id} anyway in case there were dangling resources" + if [[ "${skip_tear_down}" == "true" ]]; then + echo "Skipping tearing down benchmark" + else + run_id=${run_id} benchmark_id=${benchmark_id} ${SCRIPT_DIR}/teardown.bash + fi + return + fi + + run_id=${run_id} benchmark_id=${benchmark_id} ${SCRIPT_DIR}/setup.bash + + start_time=$(date -u +"%Y-%m-%dT%H:%M:%S.%3NZ") + + echo "Deploying benchmark environment" + kubectl apply -f ${benchmark_output_dir}/manifests/ModelServer.yaml + kubectl apply -f ${benchmark_output_dir}/manifests/LoadBalancer.yaml + echo "Waiting for deployments to be ready before starting the benchmark tool" + wait_deployments_ready + + echo "Generating BenchmarkTool manifest after the LoadBalancer is deployed, because we need to derive the IP of the load balancer to send traffic to" + go run ${SCRIPT_DIR}/../manifestgenerator/main.go \ + --catalogDir="${SCRIPT_DIR}/../catalog/" \ + --outputRootDir="${SCRIPT_DIR}/../${output_dir}" \ + --benchmarkFilePath="${benchmark_file_path}" \ + --manifestTypes="BenchmarkTool" \ + --runID="${run_id}" \ + --v=1 + echo "Deploying benchmark tool" + kubectl apply -f ${benchmark_output_dir}/manifests/BenchmarkTool.yaml + wait_deployments_ready + + echo "Collecting benchmark results" + download_benchmark_results + + if [[ -z ${gcs_bucket} ]]; then + echo "Skipping uploading to GCS as gcs bucket is not provided" + else + echo "Uploading output ${benchmark_output_dir} to GCS bucket" + gcloud storage rsync -r ${benchmark_output_dir} gs://${gcs_bucket}/${output_dir}/staging/${benchmark_id} + fi + + end_time=$(date -u +"%Y-%m-%dT%H:%M:%S.%3NZ") + dashboard="https://pantheon.corp.google.com/monitoring/dashboards/builder/32b5c92a-a12d-40ba-8bd6-b8c0a62a8d8e;startTime=${start_time};endTime=${end_time};filters=type:rlabel,key:namespace,val:${namespace}?&mods=logs_tg_prod&project=${BENCHMARK_PROJECT}" + metadata="start_time: $start_time\nend_time: $end_time\ncloud_monitoring_dashboard: ${dashboard}" + echo "metadata:\n $metadata" + echo -e "$metadata" > ${benchmark_output_dir}/metadata.txt + + # Tear down benchmark + if [[ "${skip_tear_down}" == "true" ]]; then + echo "Skipping tearing down benchmark" + else + run_id=${run_id} benchmark_id=${benchmark_id} ${SCRIPT_DIR}/teardown.bash + fi +} + +wait_deployments_ready() { + kubectl wait --for=condition=available --timeout=6000s $(kubectl get deployments -o name -n ${namespace}) -n ${namespace} +} + +# Downloads the benchmark result files from the benchmark tool pod. +download_benchmark_results() { + local benchmark_pod + local pod_finished=false + + while true; do + # Check if the pod has finished + if echo $(kubectl logs deployment/benchmark-tool -n "${namespace}") | grep -q -m 1 "LPG_FINISHED"; then + pod_finished=true + echo "Benchmark tool pod has finished." + fi + + # Get the benchmark pod name + benchmark_pod=$(kubectl get pods -l app=benchmark-tool -n "${namespace}" -o jsonpath="{.items[0].metadata.name}") + if [[ -z "${benchmark_pod}" ]]; then + echo "Benchmark pod not found yet. Retrying in 30 seconds..." + sleep 30 + continue + fi + + echo "Checking for new results from pod ${benchmark_pod}" + + # Download JSON files + local json_files=$(kubectl exec "${benchmark_pod}" -n "${namespace}" -- /bin/sh -c "ls -l | grep benchmark-catalog.*json | awk '{print \$9}'") + for f in $json_files; do + local local_json_path="${benchmark_output_dir}/results/json/${f}" + if [[ ! -f "${local_json_path}" ]]; then + echo "Downloading json file ${f}" + mkdir -p "$(dirname "${local_json_path}")" + kubectl cp -n "${namespace}" "${benchmark_pod}:${f}" "${local_json_path}" + else + echo "json file ${f} already exists locally, skipping download." + fi + done + + # Download TXT files + local txt_files=$(kubectl exec "${benchmark_pod}" -n "${namespace}" -- /bin/sh -c "ls -l | grep txt | awk '{print \$9}'") + for f in $txt_files; do + local local_txt_path="${benchmark_output_dir}/results/txt/${f}" + if [[ ! -f "${local_txt_path}" ]]; then + echo "Downloading txt file ${f}" + mkdir -p "$(dirname "${local_txt_path}")" + kubectl cp -n "${namespace}" "${benchmark_pod}:${f}" "${local_txt_path}" + else + echo "txt file ${f} already exists locally, skipping download." + fi + done + + if [[ "${pod_finished}" == "true" ]]; then + # Download logs + local local_log_path="${benchmark_output_dir}/results/benchmark-tool.log" + echo "Downloading logs from pod ${benchmark_pod}" + mkdir -p "$(dirname "${local_log_path}")" + kubectl logs deployment/benchmark-tool -n "${namespace}" > "${local_log_path}" + echo "All files downloaded and pod finished. Exiting." + break + else + echo "Waiting for new files or pod to finish. Retrying in 30 seconds..." + sleep 30 + fi + done +} + +# Env vars to be passed when calling this script. +# Example usage: benchmark="example-benchmark-config" skip_tear_down="true" gcs_bucket="benchmark-inference-gateway" ./run_benchmark.bash +gcs_bucket=${gcs_bucket:-""} +# The id of the benchmark under output/${run_id} folder. Make sure the manifests are already generated before calling this script. +benchmark_id=${benchmark_id:-"please-provide-benchmark-id"} +# set skip_tear_down to true to preserve the env after benchmark. +skip_tear_down=${skip_tear_down:-"false"} +output_dir=${output_dir:-'output'} +main \ No newline at end of file diff --git a/tools/benchmark/scripts/setup.bash b/tools/benchmark/scripts/setup.bash new file mode 100755 index 000000000..67c7de05a --- /dev/null +++ b/tools/benchmark/scripts/setup.bash @@ -0,0 +1,62 @@ +#!/bin/bash + +main(){ + SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )" + source ${SCRIPT_DIR}/env.sh + + if [[ -z ${run_id} ]]; then + run_id=$(uuidgen | tr '[:upper:]' '[:lower:]' | cut -d'-' -f1) # Get the first part (8 hex characters) + fi + echo "Using run id ${run_id}" + + namespace=${benchmark_id} + benchmark_output_dir=${SCRIPT_DIR}/../${output_dir}/${run_id}/${benchmark_id} + + if [[ -d "${benchmark_output_dir}/results/json" ]]; then + echo "The JSON results ${benchmark_output_dir}/results/json already exists, skipping. If you want to re-run the benchmark, delete the results directory and try again." + return + fi + + if [[ ${provider} == "gke" ]]; then + echo "Configuring GKE cluster" + configure_gke + fi + + # Create namespace if it doesn't exist. + kubectl create namespace "${namespace}" --dry-run=client -o yaml | kubectl apply -f - + # Copy existing HF secret to the new namespace. This is assuming you created a hf secret in the default namespace following guide + # https://gateway-api-inference-extension.sigs.k8s.io/guides/#deploy-sample-model-server + kubectl get secret hf-token --namespace=default -oyaml | grep -v '^\s*namespace:\s' | kubectl apply --namespace=${namespace} -f - + +} + + +configure_gke() { + gcloud config configurations create benchmark-catalog + gcloud config configurations activate benchmark-catalog + gcloud config set project ${BENCHMARK_PROJECT} + gcloud config set billing/quota_project ${BENCHMARK_PROJECT} + gcloud config set container/cluster ${CLUSTER_NAME} + gcloud config set compute/zone ${LOCATION} + # Configure kubectl + gcloud container clusters get-credentials ${CLUSTER_NAME} --region ${LOCATION} --project ${BENCHMARK_PROJECT} + + echo "Binding KSA to GSA for metrics scraping" + gcloud iam service-accounts add-iam-policy-binding \ + --role roles/iam.workloadIdentityUser \ + --member "serviceAccount:${BENCHMARK_PROJECT}.svc.id.goog[${namespace}/default]" \ + gmp-test-sa@${BENCHMARK_PROJECT}.iam.gserviceaccount.com --project ${BENCHMARK_PROJECT} + + kubectl annotate serviceaccount \ + --namespace ${namespace} \ + default \ + iam.gke.io/gcp-service-account=gmp-test-sa@${BENCHMARK_PROJECT}.iam.gserviceaccount.com +} + +# Cloud provider where the cluster runs. Optional. +# If provided, the tool can automate additional features applicable to this provider. For example, +# on GKE, it can configure permissions to query cloud monitoring to get model server metrics. +provider=${provider:-""} +benchmark_id=${benchmark_id:-"please-provide-benchmark-id"} +output_dir=${output_dir:-'output'} +main \ No newline at end of file diff --git a/tools/benchmark/scripts/teardown.bash b/tools/benchmark/scripts/teardown.bash new file mode 100755 index 000000000..0b9c397ca --- /dev/null +++ b/tools/benchmark/scripts/teardown.bash @@ -0,0 +1,35 @@ +#!/bin/bash + +main(){ + SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )" + source ${SCRIPT_DIR}/env.sh + + namespace=${benchmark_id} + benchmark_output_dir=${SCRIPT_DIR}/../${output_dir}/${run_id}/${benchmark_id} + + echo "Tearing down benchmark ${run_id}/${benchmark_id}" + + if [[ ${env} == "gke" ]]; then + echo "Tearing down GKE cluster" + tear_down_gke + fi + + + kubectl delete -f ${benchmark_output_dir}/manifests/BenchmarkTool.yaml --grace-period=0 --force + kubectl delete -f ${benchmark_output_dir}/manifests/ModelServer.yaml --grace-period=0 --force + kubectl delete -f ${benchmark_output_dir}/manifests/LoadBalancer.yaml --grace-period=0 --force + kubectl delete namespace ${namespace} --grace-period=0 --force +} + +tear_down_gke() { + gcloud iam service-accounts remove-iam-policy-binding \ + --role roles/iam.workloadIdentityUser \ + --member "serviceAccount:${BENCHMARK_PROJECT}.svc.id.goog[${namespace}/default]" \ + gmp-test-sa@${BENCHMARK_PROJECT}.iam.gserviceaccount.com --project ${BENCHMARK_PROJECT} +} + +env=${env:-"gke"} +# The id of the benchmark under output/${run_id} folder. Make sure the manifests are already generated before calling this script. +benchmark_id=${benchmark_id:-"please-provide-benchmark-id"} +output_dir=${output_dir:-'output'} +main \ No newline at end of file