You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-apps/2879-ready-pods-job-status/README.md
+65-31
Original file line number
Diff line number
Diff line change
@@ -38,17 +38,17 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
38
38
-[x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
39
39
-[x] (R) KEP approvers have approved the KEP status as `implementable`
40
40
-[x] (R) Design details are appropriately documented
41
-
-[] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
42
-
-[] e2e Tests for all Beta API Operations (endpoints)
43
-
-[ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
41
+
-[x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
42
+
-[x] e2e Tests for all Beta API Operations (endpoints)
43
+
-[ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
44
44
-[ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
45
-
-[] (R) Graduation criteria is in place
46
-
-[] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
45
+
-[x] (R) Graduation criteria is in place
46
+
-[x] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
47
47
-[x] (R) Production readiness review completed
48
48
-[x] (R) Production readiness review approved
49
-
-[] "Implementation History" section is up-to-date for milestone
49
+
-[x] "Implementation History" section is up-to-date for milestone
50
50
-[x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
51
-
-[] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
51
+
-[x] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
@@ -125,6 +125,7 @@ pods that have the `Ready` condition.
125
125
- Count of ready pods.
126
126
- Feature gate disablement.
127
127
- Verify passing existing E2E and conformance tests for Job.
128
+
- Added e2e test for the count of ready pods.
128
129
129
130
### Graduation Criteria
130
131
@@ -157,14 +158,14 @@ pods that have the `Ready` condition.
157
158
#### GA
158
159
159
160
- Every bug report is fixed.
160
-
- Explore setting different batch periods for regular pod updates versus
161
-
finished pod updates, so we can do less pod readiness updates without
162
-
compromising how fast we can declare a job finished.
163
-
- The job controller ignores the feature gate.
161
+
- E2e test for the count of ready pods.
162
+
- Lock the feature-gate and document deprecation of the feature-gate
164
163
165
164
#### Deprecation
166
165
167
-
N/A
166
+
In GA+2 release:
167
+
- Remove the feature gate definition
168
+
- Job controller ignores the feature gate
168
169
169
170
### Upgrade / Downgrade Strategy
170
171
@@ -210,7 +211,16 @@ The Job controller will start populating the field again.
210
211
211
212
###### Are there any tests for feature enablement/disablement?
212
213
213
-
Yes, there are tests at unit and [integration] level.
214
+
We have unit tests (see [link](https://github.com/kubernetes/kubernetes/blob/e8abe1af8dcb36f65ef7aa7135d4664b3db90e89/pkg/controller/job/job_controller_test.go#L236)) for
215
+
the `status.ready` field when the feature is enabled or disabled.
216
+
Similarly, we have integration tests (see [link](https://github.com/kubernetes/kubernetes/blob/e8abe1af8dcb36f65ef7aa7135d4664b3db90e89/test/integration/job/job_test.go#L1364)
217
+
and [link](https://github.com/kubernetes/kubernetes/blob/e8abe1af8dcb36f65ef7aa7135d4664b3db90e89/test/integration/job/job_test.go#L1517))
218
+
for the feature being enabled or disabled.
219
+
220
+
However, due to omission we graduated to Beta without feature gate
221
+
transition (enablement or disablement) tests. With graduation to stable it's too
222
+
late to add these tests so we're sticking with just manual tests
223
+
(see [here](#were-upgrade-and-rollback-tested-was-the-upgrade-downgrade-upgrade-path-tested)).
214
224
215
225
### Rollout, Upgrade and Rollback Planning
216
226
@@ -221,20 +231,20 @@ The field is only informative, it doesn't affect running workloads.
221
231
###### What specific metrics should inform a rollback?
222
232
223
233
- An increase in `job_sync_duration_seconds`.
224
-
- A reduction in `job_sync_num`.
234
+
- A reduction in `job_syncs_total`.
225
235
226
236
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
227
237
228
-
A manual test will be performed, as follows:
238
+
A manual test on Beta was performed, as follows:
229
239
230
-
1. Create a cluster in 1.23.
231
-
1. Upgrade to 1.24.
232
-
1. Create long running Job A, ensure that the ready field is populated.
233
-
1. Downgrade to 1.23.
234
-
1. Verify that ready field in Job A is not lost, but also not updated.
235
-
1. Create long running Job B, ensure that ready field is not populated.
236
-
1. Upgrade to 1.24.
237
-
1. Verify that Job A and B ready field is tracked again.
240
+
1. Create a cluster in 1.28 with the `JobReadyPods` disabled (`=false`).
241
+
2. Simulate upgrade by modifying control-plane manifests to enable `JobReadyPods`.
242
+
3. Create long running Job A, ensure that the ready field is populated.
243
+
4. Simulate downgrade by modifying control-plane manifests to disable `JobReadyPods`.
244
+
5. Verify that ready field in Job A is cleaned up shortly after the startup of the Job controller completes.
245
+
6. Create long running Job B, ensure that ready field is not populated.
246
+
7. Simulate upgrade by modifying control-plane manifests to enable `JobReadyPods`.
247
+
8. Verify that Job A and B ready field is tracked again shortly after the startup of the Job controller completes.
238
248
239
249
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
240
250
@@ -259,7 +269,7 @@ the controller doesn't create new Pods or tracks finishing Pods.
259
269
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
0 commit comments