Skip to content

Add CoderdIneligiblePrebuilds alert #34

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 18, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -228,7 +228,7 @@ values which are defined [here](https://github.com/grafana/helm-charts/tree/main

| Key | Type | Default | Description |
|-----|------|---------|-------------|
| global.coder.alerts | object | `{"coderd":{"groups":{"CPU":{"delay":"10m","enabled":true,"period":"10m","thresholds":{"critical":0.9,"warning":0.8}},"Memory":{"delay":"10m","enabled":true,"thresholds":{"critical":0.9,"warning":0.8}},"Replicas":{"delay":"5m","enabled":true,"thresholds":{"critical":1,"notify":3,"warning":2}},"Restarts":{"delay":"1m","enabled":true,"period":"10m","thresholds":{"critical":3,"notify":1,"warning":2}},"WorkspaceBuildFailures":{"delay":"10m","enabled":true,"period":"10m","thresholds":{"critical":10,"notify":2,"warning":5}}}},"enterprise":{"groups":{"Licences":{"delay":"1m","enabled":true,"thresholds":{"critical":1,"warning":0.9}}}},"provisionerd":{"groups":{"Replicas":{"delay":"5m","enabled":true,"thresholds":{"critical":1,"notify":3,"warning":2}}}}}` | alerts for the various aspects of Coder |
| global.coder.alerts | object | `{"coderd":{"groups":{"CPU":{"delay":"10m","enabled":true,"period":"10m","thresholds":{"critical":0.9,"warning":0.8}},"IneligiblePrebuilds":{"delay":"10m","enabled":true,"thresholds":{"notify":1}},"Memory":{"delay":"10m","enabled":true,"thresholds":{"critical":0.9,"warning":0.8}},"Replicas":{"delay":"5m","enabled":true,"thresholds":{"critical":1,"notify":3,"warning":2}},"Restarts":{"delay":"1m","enabled":true,"period":"10m","thresholds":{"critical":3,"notify":1,"warning":2}},"WorkspaceBuildFailures":{"delay":"10m","enabled":true,"period":"10m","thresholds":{"critical":10,"notify":2,"warning":5}}}},"enterprise":{"groups":{"Licences":{"delay":"1m","enabled":true,"thresholds":{"critical":1,"warning":0.9}}}},"provisionerd":{"groups":{"Replicas":{"delay":"5m","enabled":true,"thresholds":{"critical":1,"notify":3,"warning":2}}}}}` | alerts for the various aspects of Coder |
| global.coder.coderdSelector | string | `"pod=~`coder.*`, pod!~`.*provisioner.*`"` | series selector for Prometheus/Loki to locate provisioner pods. ensure this uses backticks for quotes! |
| global.coder.controlPlaneNamespace | string | `"coder"` | the namespace into which the control plane has been deployed. |
| global.coder.externalProvisionersNamespace | string | `"coder"` | the namespace into which any external provisioners have been deployed. |
Expand Down
7 changes: 7 additions & 0 deletions coder-observability/runbooks/coderd.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,3 +76,10 @@ Terraform plugin.
Your Enterprise license is approaching or has exceeded the number of seats purchased.

Please contact your Coder sales contact, or visit https://coder.com/contact/sales.

## CoderdIneligiblePrebuilds

Prebuilds only become eligible to be claimed by users once the workspace's agent is a) running and b) all of its startup
scripts have completed.

If a prebuilt workspace is not eligible, view its agent logs to diagnose the problem.
23 changes: 22 additions & 1 deletion coder-observability/templates/configmap-prometheus-alerts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ metadata:
name: metrics-alerts
namespace: {{ .Release.Namespace }}
data:
{{- $service := dict "service" "coder" -}}
{{- $service := dict "service" "coderd" -}}

{{- with .Values.global.coder.alerts.coderd }} {{/* start-section */}}
coderd.yaml: |-
Expand Down Expand Up @@ -104,6 +104,27 @@ data:
{{- end }}
{{- end }}

{{- with .groups.IneligiblePrebuilds }}
{{- $group := . }}
{{- if .enabled }}
- name: Coderd Ineligible Prebuilds
rules:
{{ $alert := "CoderdIneligiblePrebuilds" }}
{{- range $severity, $threshold := .thresholds }}
- alert: {{ $alert }}
expr: max by (template_name, preset_name) (coderd_prebuilds_running - coderd_prebuilds_eligible) > 0
for: {{ $group.delay }}
annotations:
summary: >
{{ `{{ $value }}` }} prebuilt workspace(s) are currently ineligible for claiming for the "{{ `{{ $labels.template_name }}` }}" template and "{{ `{{ $labels.preset_name }}` }}" preset.
This usually indicates that the agent has not started correctly, or is still running its startup scripts after an extended period of time.
labels:
severity: {{ $severity }}
runbook_url: {{ template "runbook-url" (deepCopy $ | merge (dict "alert" $alert) $service) }}
{{- end }}
{{- end }}
{{- end }}

{{- end }} {{/* end-section */}}


Expand Down
5 changes: 5 additions & 0 deletions coder-observability/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,11 @@ global:
notify: 2
warning: 5
critical: 10
IneligiblePrebuilds:
enabled: true
delay: 10m
thresholds:
notify: 1
provisionerd:
groups:
Replicas:
Expand Down
Loading