Skip to content

Commit 0868200

Browse files
authored
Document Draino remedy system
1 parent 94bd5b0 commit 0868200

File tree

1 file changed

+18
-1
lines changed

1 file changed

+18
-1
lines changed

README.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ stack, so Kubernetes will continue scheduling pods to the bad nodes.
2626
To solve this problem, we introduced this new daemon **node-problem-detector** to
2727
collect node problems from various daemons and make them visible to the upstream
2828
layers. Once upstream layers have the visibility to those problems, we can discuss the
29-
remedy system.
29+
[remedy system](#remedy-systems).
3030

3131
# Problem API
3232
node-problem-detector uses `Event` and `NodeCondition` to report problems to
@@ -138,6 +138,23 @@ For example, to test [KernelMonitor](https://github.com/kubernetes/node-problem-
138138
- You can see more rule examples under [test/kernel_log_generator/problems](https://github.com/kubernetes/node-problem-detector/tree/master/test/kernel_log_generator/problems).
139139
- For [KernelMonitor](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor.json) message injection, all messages should have ```kernel: ``` prefix (also note there is a space after ```:```).
140140

141+
# Remedy Systems
142+
A _remedy system_ is a process or processes designed to attempt to remedy problems
143+
detected by the node-problem-detector. Remedy systems observe events and/or node
144+
conditions emitted by the node-problem-detector and take action to return the
145+
Kubernetes cluster to a healthy state. The following remedy systems exist:
146+
147+
* [**Draino**](https://github.com/negz/draino) automatically drains Kubernetes
148+
nodes based on labels and node conditions. Nodes that match _all_ of the supplied
149+
labels and _any_ of the supplied node conditions will be prevented from accepting
150+
new pods (aka 'cordoned') immediately, and
151+
[drained](https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/)
152+
after a configurable time. Draino can be used in conjunction with the
153+
[Cluster Autoscaler](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler)
154+
to automatically terminate drained nodes. Refer to
155+
[this issue](https://github.com/kubernetes/node-problem-detector/issues/199)
156+
for an example production use case for Draino.
157+
141158
# Links
142159
* [Design Doc](https://docs.google.com/document/d/1cs1kqLziG-Ww145yN6vvlKguPbQQ0psrSBnEqpy0pzE/edit?usp=sharing)
143160
* [Slides](https://docs.google.com/presentation/d/1bkJibjwWXy8YnB5fna6p-Ltiy-N5p01zUsA22wCNkXA/edit?usp=sharing)

0 commit comments

Comments
 (0)