Skip to content

Commit d998446

Browse files
authored
Merge pull request #34 from coder/dk/prebuilds-alerts
Add `CoderdIneligiblePrebuilds` alert
2 parents 22bcf04 + 413aabe commit d998446

File tree

5 files changed

+47
-7
lines changed

5 files changed

+47
-7
lines changed

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -244,7 +244,7 @@ values which are defined [here](https://github.com/grafana/helm-charts/tree/main
244244

245245
| Key | Type | Default | Description |
246246
|-----|------|---------|-------------|
247-
| global.coder.alerts | object | `{"coderd":{"groups":{"CPU":{"delay":"10m","enabled":true,"period":"10m","thresholds":{"critical":0.9,"warning":0.8}},"Memory":{"delay":"10m","enabled":true,"thresholds":{"critical":0.9,"warning":0.8}},"Replicas":{"delay":"5m","enabled":true,"thresholds":{"critical":1,"notify":3,"warning":2}},"Restarts":{"delay":"1m","enabled":true,"period":"10m","thresholds":{"critical":3,"notify":1,"warning":2}},"WorkspaceBuildFailures":{"delay":"10m","enabled":true,"period":"10m","thresholds":{"critical":10,"notify":2,"warning":5}}}},"enterprise":{"groups":{"Licences":{"delay":"1m","enabled":true,"thresholds":{"critical":1,"warning":0.9}}}},"provisionerd":{"groups":{"Replicas":{"delay":"5m","enabled":true,"thresholds":{"critical":1,"notify":3,"warning":2}}}}}` | alerts for the various aspects of Coder |
247+
| global.coder.alerts | object | `{"coderd":{"groups":{"CPU":{"delay":"10m","enabled":true,"period":"10m","thresholds":{"critical":0.9,"warning":0.8}},"IneligiblePrebuilds":{"delay":"10m","enabled":true,"thresholds":{"notify":1}},"Memory":{"delay":"10m","enabled":true,"thresholds":{"critical":0.9,"warning":0.8}},"Replicas":{"delay":"5m","enabled":true,"thresholds":{"critical":1,"notify":3,"warning":2}},"Restarts":{"delay":"1m","enabled":true,"period":"10m","thresholds":{"critical":3,"notify":1,"warning":2}},"WorkspaceBuildFailures":{"delay":"10m","enabled":true,"period":"10m","thresholds":{"critical":10,"notify":2,"warning":5}}}},"enterprise":{"groups":{"Licences":{"delay":"1m","enabled":true,"thresholds":{"critical":1,"warning":0.9}}}},"provisionerd":{"groups":{"Replicas":{"delay":"5m","enabled":true,"thresholds":{"critical":1,"notify":3,"warning":2}}}}}` | alerts for the various aspects of Coder |
248248
| global.coder.coderdSelector | string | `"pod=~`coder.*`, pod!~`.*provisioner.*`"` | series selector for Prometheus/Loki to locate provisioner pods. ensure this uses backticks for quotes! |
249249
| global.coder.controlPlaneNamespace | string | `"coder"` | the namespace into which the control plane has been deployed. |
250250
| global.coder.externalProvisionersNamespace | string | `"coder"` | the namespace into which any external provisioners have been deployed. |

coder-observability/runbooks/coderd.md

+7
Original file line numberDiff line numberDiff line change
@@ -76,3 +76,10 @@ Terraform plugin.
7676
Your Enterprise license is approaching or has exceeded the number of seats purchased.
7777

7878
Please contact your Coder sales contact, or visit https://coder.com/contact/sales.
79+
80+
## CoderdIneligiblePrebuilds
81+
82+
Prebuilds only become eligible to be claimed by users once the workspace's agent is a) running and b) all of its startup
83+
scripts have completed.
84+
85+
If a prebuilt workspace is not eligible, view its agent logs to diagnose the problem.

coder-observability/templates/configmap-prometheus-alerts.yaml

+22-1
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ metadata:
44
name: metrics-alerts
55
namespace: {{ .Release.Namespace }}
66
data:
7-
{{- $service := dict "service" "coder" -}}
7+
{{- $service := dict "service" "coderd" -}}
88

99
{{- with .Values.global.coder.alerts.coderd }} {{/* start-section */}}
1010
coderd.yaml: |-
@@ -104,6 +104,27 @@ data:
104104
{{- end }}
105105
{{- end }}
106106

107+
{{- with .groups.IneligiblePrebuilds }}
108+
{{- $group := . }}
109+
{{- if .enabled }}
110+
- name: Coderd Ineligible Prebuilds
111+
rules:
112+
{{ $alert := "CoderdIneligiblePrebuilds" }}
113+
{{- range $severity, $threshold := .thresholds }}
114+
- alert: {{ $alert }}
115+
expr: max by (template_name, preset_name) (coderd_prebuilds_running - coderd_prebuilds_eligible) > 0
116+
for: {{ $group.delay }}
117+
annotations:
118+
summary: >
119+
{{ `{{ $value }}` }} prebuilt workspace(s) are currently ineligible for claiming for the "{{ `{{ $labels.template_name }}` }}" template and "{{ `{{ $labels.preset_name }}` }}" preset.
120+
This usually indicates that the agent has not started correctly, or is still running its startup scripts after an extended period of time.
121+
labels:
122+
severity: {{ $severity }}
123+
runbook_url: {{ template "runbook-url" (deepCopy $ | merge (dict "alert" $alert) $service) }}
124+
{{- end }}
125+
{{- end }}
126+
{{- end }}
127+
107128
{{- end }} {{/* end-section */}}
108129

109130

coder-observability/values.yaml

+5
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,11 @@ global:
7676
notify: 2
7777
warning: 5
7878
critical: 10
79+
IneligiblePrebuilds:
80+
enabled: true
81+
delay: 10m
82+
thresholds:
83+
notify: 1
7984
provisionerd:
8085
groups:
8186
Replicas:

0 commit comments

Comments
 (0)