Skip to content

Commit eb7f203

Browse files
authored
Merge pull request #4307 from mimowo/pod-failure-policy-ssa-update
Update PodFailurePolicy about the PodGC fix
2 parents 9e5516c + 6b6299f commit eb7f203

File tree

2 files changed

+18
-1
lines changed

2 files changed

+18
-1
lines changed

keps/sig-apps/3329-retriable-and-non-retriable-failures/README.md

+17
Original file line numberDiff line numberDiff line change
@@ -1738,9 +1738,15 @@ Third iteration (1.28):
17381738
the terminal phase. Update user-facing documentation.
17391739
Might be considered for backport to 1.27.
17401740

1741+
Fourth iteration (1.29):
1742+
- Fix the [Pod Garbage collector fails to clean up PODs from nodes that are not running anymore](https://github.com/kubernetes/kubernetes/issues/118261).
1743+
by withdrawing from SSA in the k8s controllers which were adding the `DisruptionTarget` condition.
1744+
We will reconsider returning to SSA if the issue is fixed.
1745+
17411746
#### GA
17421747

17431748
- Address reviews and bug reports from Beta users
1749+
- Reconsider returning to SSA if the issue [#113482](https://github.com/kubernetes/kubernetes/issues/113482) is fixed
17441750
- Write a blog post about the feature
17451751
- Graduate e2e tests as conformance tests
17461752
- Lock the `PodDisruptionConditions` and `JobPodFailurePolicy` feature-gates
@@ -2282,6 +2288,16 @@ No change from existing behavior of the Job controller.
22822288
- Detection: Observe failed pods with reason `Preempting`, and message `Preempted in order to admit critical pod`, but without `DisruptionTarget` condition.
22832289
- Mitigations: upgrade to a fixed version (1.26.6+, 1.27.3+ or 1.28+). Alternatively, set higher `backoffLimit` for Jobs.
22842290
- Testing: Discovered bug is covered by an integration test.
2291+
- When `PodDisruptionConditions` and pods with duplicated env. names or container ports are used, then pods cannot be deleted by PodGC and other core k8s controllers.
2292+
- Known bug in 1.26.0-10, 1.27.0-7, 1.28.0-3
2293+
- Bugs: [Pod Garbage collector fails to clean up PODs from nodes that are not running anymore](https://github.com/kubernetes/kubernetes/issues/118261)
2294+
- Detection: Pods expected to be deleted are stuck terminating. The logs show a message similar to the following: `'failed to create manager for existing fields: failed to convert new object (app-b/app-b-5894548cb-7tssd; /v1, Kind=Pod) to smd typed: .spec.containers[name="app-b"].ports: duplicate entries for key [containerPort=8082,protocol="TCP"]'`
2295+
- Mitigations: upgrade to a fixed version (1.26.11+, 1.27.8+, or 1.28.4+). Alternatively, make sure pods with
2296+
duplicated keys for env. variables or container pods are not created. Also, update the existing pods to cleanup
2297+
the problematic fields.
2298+
- Testing: [PodGC integration test](https://github.com/kubernetes/kubernetes/blob/7b9d244efd19f0d4cce4f46d1f34a6c7cff97b18/test/integration/podgc/podgc_test.go#L313)
2299+
reproduced the issue before withdrawing from SSA in PodGC in the [PR #121103](https://github.com/kubernetes/kubernetes/pull/121103).
2300+
22852301
<!--
22862302
For each of them, fill in the following information by copying the below template:
22872303
- [Failure mode brief description]
@@ -2361,6 +2377,7 @@ technics apply):
23612377
- 2023-03-17: PR "Give terminal phase correctly to all pods that will not be restarted" ([link](https://github.com/kubernetes/kubernetes/pull/115331))
23622378
- 2023-03-18: PR "API-initiated eviction: handle deleteOptions correctly" ([link](https://github.com/kubernetes/kubernetes/pull/116554))
23632379
- 2023-05-23: PR "Add DisruptionTarget condition when preempting for critical pod" ([link](https://github.com/kubernetes/kubernetes/pull/117586))
2380+
- 2023-10-19: PR "Use Patch instead of SSA for Pod Disruption condition" ([link](https://github.com/kubernetes/kubernetes/pull/121103))
23642381

23652382
<!--
23662383
Major milestones in the lifecycle of a KEP should be tracked in this section.

keps/sig-apps/3329-retriable-and-non-retriable-failures/kep.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ latest-milestone: "v1.28"
3333
milestone:
3434
alpha: "v1.25"
3535
beta: "v1.26"
36-
stable: "v1.29"
36+
stable: "v1.30"
3737

3838
# The following PRR answers are required at alpha release
3939
# List the feature gate name and the components for which it must be enabled

0 commit comments

Comments
 (0)