Skip to content

Commit 76dcd4f

Browse files
authored
Merge pull request #4059 from andrewsykim/kep-3716
KEP-3716: add beta milestone for AdmissionWebhookMatchConditions
2 parents 68f741d + 487bc81 commit 76dcd4f

File tree

3 files changed

+67
-25
lines changed

3 files changed

+67
-25
lines changed

keps/prod-readiness/sig-api-machinery/3716.yaml

+2
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,5 @@
44
kep-number: 3716
55
alpha:
66
approver: "@deads2k"
7+
beta:
8+
approver: "@deads2k"

keps/sig-api-machinery/3716-admission-webhook-match-conditions/README.md

+59-21
Original file line numberDiff line numberDiff line change
@@ -54,10 +54,10 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
5454
- [ ] (R) Design details are appropriately documented
5555
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
5656
- [ ] e2e Tests for all Beta API Operations (endpoints)
57-
- [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
57+
- [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
5858
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
5959
- [ ] (R) Graduation criteria is in place
60-
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
60+
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
6161
- [ ] (R) Production readiness review completed
6262
- [ ] (R) Production readiness review approved
6363
- [ ] "Implementation History" section is up-to-date for milestone
@@ -374,10 +374,8 @@ cases outlined above will be added.
374374

375375
- Add E2E test coverage
376376
- Resolve resource constraints validation
377-
378-
<<[UNRESOLVED resource constraints ]>>
379-
Additional beta requirements TBD
380-
<<[/UNRESOLVED]>>
377+
- Smart reload/recompile of Webhook Accessors, see [issue](https://github.com/kubernetes/kubernetes/issues/116588)
378+
- ValidatingAdmissionPolicy is promoted to Beta.
381379

382380
#### GA
383381

@@ -491,12 +489,15 @@ rollout. Similarly, consider large clusters and how enablement/disablement
491489
will rollout across nodes.
492490
-->
493491

492+
In general, rollout / rollback should not fail since the feature is not enabled by default.
493+
However, there are risks on rollback if webhook preconditions was enabled and then unexpectedly
494+
disabled on rollback.
495+
494496
###### What specific metrics should inform a rollback?
495497

496-
<!--
497-
What signals should users be paying attention to when the feature is young
498-
that might indicate a serious problem?
499-
-->
498+
- `webhook_admission_match_condition_evaluation_errors_total` is high
499+
- `webhook_admission_match_condition_exclusions_total` is too high or too low
500+
- `webhook_admission_match_condition_evaluation_seconds` is high
500501

501502
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
502503

@@ -506,12 +507,16 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
506507
are missing a bunch of machinery and tooling and can't do that now.
507508
-->
508509

510+
Not yet, but manual testing should be completed and documented prior to beta.
511+
509512
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
510513

511514
<!--
512515
Even if applying deprecation policies, they may still surprise some users.
513516
-->
514517

518+
No.
519+
515520
### Monitoring Requirements
516521

517522
A new per-webhook metric will measure the number of requests excluded by match conditions:
@@ -537,6 +542,9 @@ checking if there are objects with field X set) may be a last resort. Avoid
537542
logs or events for this purpose.
538543
-->
539544

545+
The metric `webhook_admission_match_condition_exclusions_total` should indicate if the precondition
546+
is used to exclude objects from invoking webhooks.
547+
540548
###### How can someone using this feature know that it is working for their instance?
541549

542550
<!--
@@ -548,13 +556,10 @@ and operation of this feature.
548556
Recall that end users cannot usually observe component logs or access metrics.
549557
-->
550558

551-
- [ ] Events
552-
- Event Reason:
553-
- [ ] API .status
554-
- Condition name:
555-
- Other field:
556-
- [ ] Other (treat as last resort)
559+
- [X] Other (treat as last resort)
557560
- Details:
561+
* Check the preconditions field in the webhook object and check the `webhook_admission_match_condition_exclusions_total` metric for exclusions
562+
* Check `webhook_admission_match_condition_evaluation_errors_total`
558563

559564
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
560565

@@ -573,18 +578,22 @@ These goals will help you determine what you need to measure (SLIs) in the next
573578
question.
574579
-->
575580

581+
* Only negligible impact to admission latency due to evaluation of CEL rules
582+
* CEL evaluation time (`webhook_admission_match_condition_evaluation_seconds`)
583+
* CEL evaluation errors (`webhook_admission_match_condition_evaluation_errors_total`)
584+
576585
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
577586

578587
<!--
579588
Pick one more of these and delete the rest.
580589
-->
581590

582-
- [ ] Metrics
591+
- [X] Metrics
583592
- Metric name:
593+
- `webhook_admission_match_condition_evaluation_seconds`
594+
- `webhook_admission_match_condition_evaluation_errors_total`
584595
- [Optional] Aggregation method:
585-
- Components exposing the metric:
586-
- [ ] Other (treat as last resort)
587-
- Details:
596+
- Components exposing the metric: kube-apiserver
588597

589598
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
590599

@@ -593,6 +602,10 @@ Describe the metrics themselves and the reasons why they weren't added (e.g., co
593602
implementation difficulties, etc.).
594603
-->
595604

605+
Yes, the following metrics will be considered for Beta which will improve observability of this feature:
606+
* `webhook_admission_match_condition_evaluation_seconds`
607+
* `webhook_admission_match_condition_exclusions_total`
608+
596609
### Dependencies
597610

598611
<!--
@@ -616,6 +629,8 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
616629
- Impact of its degraded performance or high-error rates on the feature:
617630
-->
618631

632+
No
633+
619634
### Scalability
620635

621636
<!--
@@ -643,6 +658,8 @@ Focusing mostly on:
643658
heartbeats, leader election, etc.)
644659
-->
645660

661+
No
662+
646663
###### Will enabling / using this feature result in introducing new API types?
647664

648665
<!--
@@ -652,6 +669,8 @@ Describe them, providing:
652669
- Supported number of objects per namespace (for namespace-scoped objects)
653670
-->
654671

672+
No, this feature only adds new fields to existing webhook APIs
673+
655674
###### Will enabling / using this feature result in any new calls to the cloud provider?
656675

657676
<!--
@@ -660,6 +679,8 @@ Describe them, providing:
660679
- Estimated increase:
661680
-->
662681

682+
No
683+
663684
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
664685

665686
<!--
@@ -669,17 +690,27 @@ Describe them, providing:
669690
- Estimated amount of new objects: (e.g., new Object X for every existing Pod)
670691
-->
671692

693+
Yes, it can increase size of webhook configuration objects. There is a limit in place for the number of preconditions
694+
a webhook can have, however, webhook objects can still increase in size significantly if large expressions are used.
695+
696+
- API types(s): ValidatingWebhookConfiguration, MutatingWebhookConfiguration
697+
Estimated increase in size: depends on size of CEL expressions, but should be negligible in most cases
698+
Estimated amount of new objects: none
699+
672700
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
673701

674702
<!--
675703
Look at the [existing SLIs/SLOs].
676704
677705
Think about adding additional work or introducing new steps in between
678706
(e.g. need to do X to start a container), etc. Please describe the details.
707+
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
679708
680709
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
681710
-->
682711

712+
Yes, it can impact latency SLI/SLO if evaluating CEL expressions add significant latency.
713+
683714
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
684715

685716
<!--
@@ -688,10 +719,11 @@ non-trivial computations, excessive access to disks (including increased log
688719
volume), significant amount of data sent and/or received over network, etc.
689720
This through this both in small and large cases, again with respect to the
690721
[supported limits].
691-
692722
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
693723
-->
694724

725+
Yes, it has potential to increase CPU usage in kube-apiserver if there is a webhook intercepting many requests with many precondition rules.
726+
695727
### Troubleshooting
696728

697729
See [Debuggability](#debuggability).
@@ -709,6 +741,8 @@ details). For now, we leave it here.
709741

710742
###### How does this feature react if the API server and/or etcd is unavailable?
711743

744+
N/A -- since the feature is part of kube-apiserver.
745+
712746
###### What are other known failure modes?
713747

714748
<!--
@@ -724,8 +758,12 @@ For each of them, fill in the following information by copying the below templat
724758
- Testing: Are there any tests for failure mode? If not, describe why.
725759
-->
726760

761+
N/A
762+
727763
###### What steps should be taken if SLOs are not being met to determine the problem?
728764

765+
Feature can be disabled per webhook by removing preconditions or for all webhooks by disabling the feature gate in kube-apiserver.
766+
729767
## Implementation History
730768

731769
<!--

keps/sig-api-machinery/3716-admission-webhook-match-conditions/kep.yaml

+6-4
Original file line numberDiff line numberDiff line change
@@ -30,17 +30,17 @@ see-also:
3030
- https://github.com/kubernetes/enhancements/pull/3694
3131

3232
# The target maturity stage in the current dev cycle for this KEP.
33-
stage: alpha
33+
stage: beta
3434

3535
# The most recent milestone for which work toward delivery of this KEP has been
3636
# done. This can be the current (upcoming) milestone, if it is being actively
3737
# worked on.
38-
latest-milestone: "v1.27"
38+
latest-milestone: "v1.28"
3939

4040
# The milestone at which this feature was, or is targeted to be, at each stage.
4141
milestone:
4242
alpha: "v1.27"
43-
beta: "TBD"
43+
beta: "v1.28"
4444
stable: "TBD"
4545

4646
# The following PRR answers are required at alpha release
@@ -53,4 +53,6 @@ disable-supported: true
5353

5454
# The following PRR answers are required at beta release
5555
metrics:
56-
- TBD
56+
- webhook_admission_match_condition_exclusions_total
57+
- webhook_admission_match_condition_evaluation_errors_total
58+
- webhook_admission_match_condition_evaluation_seconds

0 commit comments

Comments
 (0)