You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-apps/3329-retriable-and-non-retriable-failures/README.md
+17
Original file line number
Diff line number
Diff line change
@@ -1738,9 +1738,15 @@ Third iteration (1.28):
1738
1738
the terminal phase. Update user-facing documentation.
1739
1739
Might be considered for backport to 1.27.
1740
1740
1741
+
Fourth iteration (1.29):
1742
+
- Fix the [Pod Garbage collector fails to clean up PODs from nodes that are not running anymore](https://github.com/kubernetes/kubernetes/issues/118261).
1743
+
by withdrawing from SSA in the k8s controllers which were adding the `DisruptionTarget` condition.
1744
+
We will reconsider returning to SSA if the issue is fixed.
1745
+
1741
1746
#### GA
1742
1747
1743
1748
- Address reviews and bug reports from Beta users
1749
+
- Reconsider returning to SSA if the issue [#113482](https://github.com/kubernetes/kubernetes/issues/113482) is fixed
1744
1750
- Write a blog post about the feature
1745
1751
- Graduate e2e tests as conformance tests
1746
1752
- Lock the `PodDisruptionConditions` and `JobPodFailurePolicy` feature-gates
@@ -2282,6 +2288,16 @@ No change from existing behavior of the Job controller.
2282
2288
- Detection: Observe failed pods with reason `Preempting`, and message `Preempted in order to admit critical pod`, but without `DisruptionTarget` condition.
2283
2289
- Mitigations: upgrade to a fixed version (1.26.6+, 1.27.3+ or 1.28+). Alternatively, set higher `backoffLimit` for Jobs.
2284
2290
- Testing: Discovered bug is covered by an integration test.
2291
+
- When `PodDisruptionConditions` and pods with duplicated env. names or container ports are used, then pods cannot be deleted by PodGC and other core k8s controllers.
2292
+
- Known bug in 1.26.0-10, 1.27.0-7, 1.28.0-3
2293
+
- Bugs: [Pod Garbage collector fails to clean up PODs from nodes that are not running anymore](https://github.com/kubernetes/kubernetes/issues/118261)
2294
+
- Detection: Pods expected to be deleted are stuck terminating. The logs show a message similar to the following: `'failed to create manager for existing fields: failed to convert new object (app-b/app-b-5894548cb-7tssd; /v1, Kind=Pod) to smd typed: .spec.containers[name="app-b"].ports: duplicate entries for key [containerPort=8082,protocol="TCP"]'`
2295
+
- Mitigations: upgrade to a fixed version (1.26.11+, 1.27.8+, or 1.28.4+). Alternatively, make sure pods with
2296
+
duplicated keys for env. variables or container pods are not created. Also, update the existing pods to cleanup
reproduced the issue before withdrawing from SSA in PodGC in the [PR #121103](https://github.com/kubernetes/kubernetes/pull/121103).
2300
+
2285
2301
<!--
2286
2302
For each of them, fill in the following information by copying the below template:
2287
2303
- [Failure mode brief description]
@@ -2361,6 +2377,7 @@ technics apply):
2361
2377
- 2023-03-17: PR "Give terminal phase correctly to all pods that will not be restarted" ([link](https://github.com/kubernetes/kubernetes/pull/115331))
0 commit comments