Skip to content

Commit 9e5516c

Browse files
authored
Merge pull request #4311 from aroradaman/kube-proxy-config-v1alpha2
KEP-784: update template
2 parents e6ac4bd + 762f52b commit 9e5516c

File tree

2 files changed

+272
-67
lines changed

2 files changed

+272
-67
lines changed
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,4 @@
1-
# kube-proxy component config graduation proposal
2-
3-
## Table of Contents
1+
# KEP-784: Kube Proxy component configuration graduation
42

53
<!-- toc -->
64
- [Release Signoff Checklist](#release-signoff-checklist)
@@ -9,40 +7,52 @@
97
- [Goals](#goals)
108
- [Non-Goals](#non-goals)
119
- [Proposal](#proposal)
12-
- [Re-encapsulate mode specific options](#re-encapsulate-mode-specific-options)
13-
- [Example](#example)
1410
- [Risks and Mitigations](#risks-and-mitigations)
1511
- [Design Details](#design-details)
1612
- [Test Plan](#test-plan)
13+
- [Prerequisite testing updates](#prerequisite-testing-updates)
14+
- [Unit tests](#unit-tests)
15+
- [Integration tests](#integration-tests)
16+
- [e2e tests](#e2e-tests)
1717
- [Graduation Criteria](#graduation-criteria)
18+
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
19+
- [Version Skew Strategy](#version-skew-strategy)
20+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
21+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
22+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
23+
- [Monitoring Requirements](#monitoring-requirements)
24+
- [Dependencies](#dependencies)
25+
- [Scalability](#scalability)
26+
- [Troubleshooting](#troubleshooting)
27+
- [Implementation History](#implementation-history)
28+
- [Drawbacks](#drawbacks)
29+
- [Alternatives](#alternatives)
30+
- [Infrastructure Needed (Optional)](#infrastructure-needed-optional)
1831
<!-- /toc -->
1932

2033
## Release Signoff Checklist
2134

22-
**ACTION REQUIRED:** In order to merge code into a release, there must be an issue in [kubernetes/enhancements] referencing this KEP and targeting a release milestone **before [Enhancement Freeze](https://github.com/kubernetes/sig-release/tree/master/releases)
23-
of the targeted release**.
24-
25-
For enhancements that make changes to code or processes/procedures in core Kubernetes i.e., [kubernetes/kubernetes], we require the following Release Signoff checklist to be completed.
26-
27-
Check these off as they are completed for the Release Team to track. These checklist items _must_ be updated for the enhancement to be released.
28-
29-
- [ ] kubernetes/enhancements issue in release milestone, which links to KEP (this should be a link to the KEP location in kubernetes/enhancements, not the initial KEP PR)
30-
- [X] KEP approvers have set the KEP status to `implementable`
31-
- [ ] Design details are appropriately documented
32-
- [ ] Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
33-
- [ ] Graduation criteria is in place
35+
Items marked with (R) are required *prior to targeting to a milestone / release*.
36+
37+
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
38+
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
39+
- [ ] (R) Design details are appropriately documented
40+
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
41+
- [ ] e2e Tests for all Beta API Operations (endpoints)
42+
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
43+
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
44+
- [ ] (R) Graduation criteria is in place
45+
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
46+
- [ ] (R) Production readiness review completed
47+
- [ ] (R) Production readiness review approved
3448
- [ ] "Implementation History" section is up-to-date for milestone
3549
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
36-
- [ ] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
37-
38-
**Note:** Any PRs to move a KEP to `implementable` or significant changes once it is marked `implementable` should be approved by each of the KEP approvers. If any of those approvers is no longer appropriate than changes to that list should be approved by the remaining approvers and/or the owning SIG (or SIG-arch for cross cutting KEPs).
39-
40-
**Note:** This checklist is iterative and should be reviewed and updated every time this enhancement is being considered for a milestone.
50+
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
4151

4252
[kubernetes.io]: https://kubernetes.io/
43-
[kubernetes/enhancements]: https://github.com/kubernetes/enhancements/issues
44-
[kubernetes/kubernetes]: https://github.com/kubernetes/kubernetes
45-
[kubernetes/website]: https://github.com/kubernetes/website
53+
[kubernetes/enhancements]: https://git.k8s.io/enhancements
54+
[kubernetes/kubernetes]: https://git.k8s.io/kubernetes
55+
[kubernetes/website]: https://git.k8s.io/website
4656

4757
## Summary
4858

@@ -51,68 +61,63 @@ This document is intended to propose a process and desired goals by which kube-p
5161
## Motivation
5262

5363
kube-proxy is a component, that is present in almost all Kubernetes clusters in existence.
54-
Historically speaking, kube-proxy's configuration was supplied by a set of command line flags. Over time, the number of flags grew and they became unwieldy to use and support. Thus, kube-proxy gained component config.
64+
Historically speaking, kube-proxy's configuration was supplied by a set of command line flags. Over time, the number of flags grew, and they became unwieldy to use and support. Thus, kube-proxy gained component config.
5565
Initially this was just a large flat object, that was representing the command line flags. However, over time new features were added to it, all while staying as v1alpha1.
5666

5767
This resulted in a configuration format, that had various different options grouped together in ways, that made them hard to specify and understand. For example:
5868

59-
- Instance local options (such as host name override, bind address, etc.) are in the same flat object as shared between instances options (such as the cluster CIDR, config sync period, etc.).
60-
- Platform specific options are mixed together. For example, the IPTables rule sync fields are used by the Windows HNS backend for the same purpose.
61-
- Again, the IPTables rule sync options are used for the Linux legacy user mode proxy, but not for the IPVS mode (where a set of identical options exist, despite the fact, that it too uses some other fields, designed for IPTables).
69+
- Instance local options (such as hostnameOverride, bindAddress, etc.) are in the same flat object as shared between instances options (such as clusterCIDR, configSyncPeriod, etc.).
70+
- Platform specific options are marked as generic options (eg. conntrack, oomScoreAdj).
71+
- Backend agnostic options are marked as backed specific options (eg. syncPeriod, minSyncPeriod).
72+
- Options specific to a backend are used by other backends (eg. masqueradeBit and masqueradeAll).
73+
74+
[kubernetes/issues/117909](https://github.com/kubernetes/kubernetes/issues/117909) captures all the mis configurations in details.
6275

6376
Clearly, this made the configuration both hard to use and to maintain. Therefore, a plan to restructure and stabilize the config format is needed.
6477

6578
### Goals
6679

67-
- To cleanup the existing config format.
80+
- To clean up the existing config format.
6881
- To provide config structure, that is easier for users to understand and use.
6982
- To distinguish between instance local and shared settings.
7083
- To allow for the persistence of settings for different platforms (such as Linux and Windows) in a manner that reduces confusion and the possibility of an error.
7184
- To allow for easier introduction of new proxy backends.
72-
- To provide users with flexibility, especially with regards to the config source.
85+
- To provide users with flexibility, especially in regard to the config source.
7386

7487
### Non-Goals
7588

7689
- To change or implement additional features in kube-proxy.
7790
- To deal with graduation of any other component of kube-proxy, other than its configuration.
78-
- To remove most or even all of the command line flags, that have corresponding component config options.
91+
- To remove most or even all the command line flags, that have corresponding component config options.
7992

8093
## Proposal
8194

82-
The idea is to conduct the process of graduation to beta in small steps in the span of at least one Kubernetes release cycle. This will be done by creating one or more alpha versions of the config with the last alpha version being copied as v1beta1 after the community is happy with it.
83-
Each of the sub-sections below can result in a separate alpha version release, although it will be better for users to have no more than a couple of alpha versions past v1alpha1.
84-
After each alpha version release, the community will gather around for new ideas on how to proceed in the graduation process. If there are viable proposals, this document is updated with an appropriate section(s) below and the new changes are introduced in the form of new alpha version(s).
85-
The proposed process is similar to the already successfully used one for kubeadm.
86-
87-
### Re-encapsulate mode specific options
95+
The idea is to conduct the process of graduation to beta in small steps in the span of at least one Kubernetes release cycle.
96+
This will be done by creating one or more alpha versions of the config with the last alpha version being copied as v1beta1 after
97+
the community is happy with it. Each of the subsections below can result in a separate alpha version release, although it will
98+
be better for users to have no more than a couple of alpha versions past v1alpha1. After each alpha version release, the community
99+
will gather around for new ideas on how to proceed in the graduation process. If there are viable proposals, this document is
100+
updated with an appropriate section(s) below and the new changes are introduced in the form of new alpha version(s). The proposed
101+
process is similar to the already successfully used one for kubeadm.
88102

89103
The current state of the config has proven that:
90104
- Some options are deemed as mode specific, but are in fact shared between all modes.
91105
- Some options are placed directly into KubeProxyConfiguration, but are in fact mode specific ones.
92106
- There are options that are shared between some (but not all) modes. Specific features of the underlying implementation are common and this happens only within the boundaries of the platform (iptables and ipvs modes for example).
93-
- Although legacy Linux and Windows user mode proxies are separate code bases, they have a common set of options.
94107

95108
With that in mind, the following measures are proposed:
96-
- Mode specific structs are consolidated to not use fields from other mode specific structs.
97-
- Introduce a single combined legacy user mode proxy struct for both Linux and Windows backends.
98-
99-
#### Example
100-
101-
```yaml
102-
commonSetting1: ...
103-
commonSetting2: ...
104-
...
105-
modeA: ...
106-
modeB: ...
107-
modeC: ...
108-
```
109+
- Create platform subsection for platform specific fields.
110+
- Move backend-agnostic and platform-agnostic fields from backend section to root section.
111+
- Move backend-agnostic and platform-specific fields from backend section to relevant platform section.
112+
- Drop legacy/unused options.
109113

110114
### Risks and Mitigations
111115

112116
So far, the following risks have been identified:
113117
- Deviation of the implementation guidelines and bad planning may have the undesired effect of producing bad alpha versions.
114118
- Bad alpha versions will need good alpha versions to fix them. This will create too many iterations over the API and users may get confused.
115-
- New and redesigned kube-proxy API versions may cause confusion among users who are used to the v1alpha1 relatively flat, single document design. In particular, multiple YAML documents and structured (as opposed to flat) objects can create confusion as to what option is placed where.
119+
- New and redesigned kube-proxy API versions may cause confusion among users who are used to the v1alpha1 relatively flat, single document design.
120+
In particular, multiple YAML documents and structured (as opposed to flat) objects can create confusion as to what option is placed where.
116121

117122
The mitigations to those risks:
118123
- Strict following of the proposals in this document and planning ahead for a release and config cycle.
@@ -122,16 +127,192 @@ The mitigations to those risks:
122127

123128
## Design Details
124129

130+
125131
### Test Plan
126132

127-
Existing test cases throughout the kube-proxy code base should be adapted to use the latest config version.
128-
If required, new test cases should also be created.
133+
[x] I/we understand the owners of the involved components may require updates to
134+
existing tests to make this code solid enough prior to committing the changes necessary
135+
to implement this enhancement.
136+
137+
##### Prerequisite testing updates
138+
139+
140+
##### Unit tests
141+
142+
143+
##### Integration tests
144+
145+
146+
##### e2e tests
129147

130148
### Graduation Criteria
131149

132150
The config should be considered graduated to beta if it:
133-
- is well structured with clear boundaries between different proxy mode settings.
151+
- is well-structured with clear boundaries between different proxy mode settings.
134152
- allows for easy multi-platform use with less probability of an error.
135-
- allows for easy distinguishment between instance local and shared settings.
153+
- allows for easy distinction between instance local and shared settings.
136154
- is well covered by tests.
137-
- is well documented. Especially with regards of migrating to it from older versions.
155+
- is well documented. Especially in regard to migrating to it from older versions.
156+
157+
### Upgrade / Downgrade Strategy
158+
159+
160+
### Version Skew Strategy
161+
162+
163+
## Production Readiness Review Questionnaire
164+
165+
166+
### Feature Enablement and Rollback
167+
168+
169+
###### How can this feature be enabled / disabled in a live cluster?
170+
171+
<!--
172+
Pick one of these and delete the rest.
173+
174+
Documentation is available on [feature gate lifecycle] and expectations, as
175+
well as the [existing list] of feature gates.
176+
177+
[feature gate lifecycle]: https://git.k8s.io/community/contributors/devel/sig-architecture/feature-gates.md
178+
[existing list]: https://kubernetes.io/docs/reference/command-line-tools-reference/feature-gates/
179+
-->
180+
181+
- [ ] Feature gate (also fill in values in `kep.yaml`)
182+
- Feature gate name:
183+
- Components depending on the feature gate:
184+
- [x] Other
185+
- Describe the mechanism:
186+
- Will enabling / disabling the feature require downtime of the control
187+
plane?
188+
- Will enabling / disabling the feature require downtime or reprovisioning
189+
of a node?
190+
191+
###### Does enabling the feature change any default behavior?
192+
193+
<!--
194+
Any change of default behavior may be surprising to users or break existing
195+
automations, so be extremely careful here.
196+
-->
197+
198+
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
199+
200+
<!--
201+
Describe the consequences on existing workloads (e.g., if this is a runtime
202+
feature, can it break the existing applications?).
203+
204+
Feature gates are typically disabled by setting the flag to `false` and
205+
restarting the component. No other changes should be necessary to disable the
206+
feature.
207+
208+
NOTE: Also set `disable-supported` to `true` or `false` in `kep.yaml`.
209+
-->
210+
211+
###### What happens if we reenable the feature if it was previously rolled back?
212+
213+
###### Are there any tests for feature enablement/disablement?
214+
215+
<!--
216+
The e2e framework does not currently support enabling or disabling feature
217+
gates. However, unit tests in each component dealing with managing data, created
218+
with and without the feature, are necessary. At the very least, think about
219+
conversion tests if API types are being modified.
220+
221+
Additionally, for features that are introducing a new API field, unit tests that
222+
are exercising the `switch` of feature gate itself (what happens if I disable a
223+
feature gate after having objects written with the new field) are also critical.
224+
You can take a look at one potential example of such test in:
225+
https://github.com/kubernetes/kubernetes/pull/97058/files#diff-7826f7adbc1996a05ab52e3f5f02429e94b68ce6bce0dc534d1be636154fded3R246-R282
226+
-->
227+
228+
### Rollout, Upgrade and Rollback Planning
229+
230+
###### How can a rollout or rollback fail? Can it impact already running workloads?
231+
232+
###### What specific metrics should inform a rollback?
233+
234+
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
235+
236+
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
237+
238+
### Monitoring Requirements
239+
240+
###### How can an operator determine if the feature is in use by workloads?
241+
242+
###### How can someone using this feature know that it is working for their instance?
243+
244+
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
245+
246+
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
247+
248+
- [ ] Metrics
249+
- Metric name:
250+
- [Optional] Aggregation method:
251+
- Components exposing the metric:
252+
- [ ] Other (treat as last resort)
253+
- Details:
254+
255+
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
256+
257+
No.
258+
259+
### Dependencies
260+
261+
###### Does this feature depend on any specific services running in the cluster?
262+
263+
No.
264+
265+
### Scalability
266+
267+
268+
###### Will enabling / using this feature result in any new API calls?
269+
270+
No.
271+
272+
###### Will enabling / using this feature result in introducing new API types?
273+
274+
Yes.
275+
[WIP]
276+
277+
###### Will enabling / using this feature result in any new calls to the cloud provider?
278+
279+
No.
280+
281+
###### Will enabling / using this feature result in increasing size or count of the existing API objects?
282+
283+
Yes.
284+
[WIP]
285+
286+
###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
287+
288+
No.
289+
290+
###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
291+
292+
No.
293+
294+
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
295+
296+
No.
297+
298+
### Troubleshooting
299+
300+
301+
###### How does this feature react if the API server and/or etcd is unavailable?
302+
303+
304+
###### What are other known failure modes?
305+
306+
307+
###### What steps should be taken if SLOs are not being met to determine the problem?
308+
309+
310+
## Implementation History
311+
312+
313+
## Drawbacks
314+
315+
## Alternatives
316+
317+
## Infrastructure Needed (Optional)
318+

0 commit comments

Comments
 (0)