You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-apps/3329-retriable-and-non-retriable-failures/README.md
+15-2
Original file line number
Diff line number
Diff line change
@@ -1279,7 +1279,7 @@ condition makes it easier to determine if a failed pod should be restarted):
1279
1279
- DeletionByTaintManager (Pod evicted by kube-controller-manager due to taints)
1280
1280
- EvictionByEvictionAPI (Pod deleted by Eviction API)
1281
1281
- DeletionByPodGC (an orphaned Pod deleted by pod GC)
1282
-
- TerminationByKubelet (Pod terminated due to graceful node shutdown or node resource pressure).
1282
+
- TerminationByKubelet (Pod terminated due to graceful node shutdown, node resource pressure, or Kubelet preemption for critical pods).
1283
1283
1284
1284
The already existing `status.conditions` field in Pod will be used by kubernetes
1285
1285
components to append a dedicated condition.
@@ -1713,6 +1713,10 @@ Second iteration:
1713
1713
- Extend the feature documentation to explain transitioning of pending and
1714
1714
terminating pods into `Failed` phase.
1715
1715
1716
+
Third iteration (1.28):
1717
+
- Add `DisruptionTarget` condition for pods which are preempted by Kubelet to make room for critical pods.
1718
+
Also, backport this fix to 1.26 and 1.27 release branches, and update the user-facing documentation to reflect this change.
1719
+
1716
1720
#### GA
1717
1721
1718
1722
- Address reviews and bug reports from Beta users
@@ -2250,7 +2254,13 @@ No change from existing behavior of the Job controller.
2250
2254
- Detection: Observe that the pods are not deleted when a node is tainted with `NoExecute`
2251
2255
- Mitigations: disable `PodDisruptionConditions`
2252
2256
- Testing: Discovered bugs are covered by unit and integration tests.
2253
-
2257
+
- `DisruptionTarget`condition is not added to pods preempted by Kubelet when scheduling a critical pod. As a consequence
2258
+
there is no way to handle such pod failures with pod failure policy.
2259
+
- Known bug in 1.26.0-5 and 1.27.0-2
2260
+
- Bugs: described in [Add DisruptionTarget condition when preempting for critical pod](https://github.com/kubernetes/kubernetes/pull/117586)
2261
+
- Detection: Observe failed pods with reason `Preempting`, and message `Preempted in order to admit critical pod`, but without `DisruptionTarget` condition.
2262
+
- Mitigations: upgrade to a fixed version (1.26.6+, 1.27.3+ or 1.28+). Alternatively, set higher `backoffLimit` for Jobs.
2263
+
- Testing: Discovered bug is covered by an integration test.
2254
2264
<!--
2255
2265
For each of them, fill in the following information by copying the below template:
2256
2266
- [Failure mode brief description]
@@ -2327,6 +2337,9 @@ technics apply):
2327
2337
- 2023-01-03: PR "Fix clearing of rate-limiter for the queue of checks for cleaning stale pod disruption conditions" ([link](https://github.com/kubernetes/kubernetes/pull/114770))
2328
2338
- 2023-01-09: PR "Adjust DisruptionTarget condition message to do not include preemptor pod metadata" ([link](https://github.com/kubernetes/kubernetes/pull/114914))
2329
2339
- 2023-01-13: PR "PodGC should not add DisruptionTarget condition for pods which are in terminal phase" ([link](https://github.com/kubernetes/kubernetes/pull/115056))
2340
+
- 2023-03-17: PR "Give terminal phase correctly to all pods that will not be restarted" ([link](https://github.com/kubernetes/kubernetes/pull/115331))
0 commit comments