- What work did the SIG do this year that should be highlighted?
Notable work we produced in 2023 includes:
- Helping validate scalability/performance impact and reliability for many features across the year as well as preventing regressions
- Enhancements to our scalability test framework around better instrumentation and reduction in flakiness
- Additional test coverage spanning across the Kubernetes project (e.g list calls with AP&F, in-cluster network/DNS programming SLIs)
- Driving/influencing a bunch of KEPs and fixes, mainly within Kubernetes core (API server, API machinery and etcd)
A key good news we now have recurring 5k-node scalability CI tests on AWS (using kops). These are currently release-informing and plan is to graduate them to release-blocking in 2024. This effort helps decentralize scalability testing and costs for the SIG that were previously heavily borne by GCP.
- Are there any areas and/or subprojects that your group needs help with (e.g. fewer than 2 active OWNERS)?
Overall in 2023 we have had healthy contributions from multiple companies (see these devstats. Compared to 2022, we have seen an uptick in contributions around test framework improvements and test suite coverage for scalability/performance. Some of those were driven by other SIGs, but most continue to come from within the SIG. The increased contributions from AWS towards scale testing and debugging performance issues comes as good news. But overall, we continue to seek help from various SIGs to force-multiply scalability test coverage and regression hunting for features/components they own. We also encourage them to proactively identify and document SLIs/SLOs/limits for APIs and workflows they own. This allows each SIG to set a scalability bar for their systems (just like up-time/availability) and thereby make scalability a first class citizen of Kubernetes and related CNCF projects. As always, SIG scalability is eager to assist/guide with this process.
- Did you have community-wide updates in 2023 (e.g. KubeCon talks)?
We presented SIG Scalability Intro + Deep-Dive updates at both EU and NA KubeCons:
- KEP work in 2023 (v1.27, v1.28, v1.29):
Notable KEPs, mostly co-owned with SIG API machinery and SIG etcd:
- 1040 - API Priority and Fairness
- GA in 1.29
- 2340 - Consistent Reads from Cache
- Second alpha in 1.30 (blocked by resolution of this etcd issue)
- 3157 - API Streaming Lists
- Third alpha in 1.30 (blocked by resolution of this etcd issue)
- 4222 - Binary Encoding for CRDs
- Pre-alpha in 1.30
Some notable non-KEP improvements:
- Graceful termination of watches during API server shutdown
- Landed in 1.27
- Addressed monitoring gaps in API server extension mechanisms
- Landed in 1.28
- Cache JSON-encoded watch events to reduce redundant work with multiple watches
- Landed in 1.29
- Memory-efficient handling of watch requests preflight
- Landed in 1.30
New in 2023:
- None
Retired in 2023:
- None
Continuing:
- kubernetes-scalability-and-performance-tests-and-validation
- kubernetes-scalability-bottlenecks-detection
- kubernetes-scalability-definition
- kubernetes-scalability-governance
- kubernetes-scalability-test-frameworks
New in 2023:
- None
Retired in 2023:
- Reliability
Operational tasks in sig-governance.md:
- README.md reviewed for accuracy and updated if needed
- CONTRIBUTING.md reviewed for accuracy and updated if needed
- Other contributing docs (e.g. in devel dir or contributor guide) reviewed for accuracy and updated if needed
- Subprojects list and linked OWNERS files in sigs.yaml reviewed for accuracy and updated if needed
- SIG leaders (chairs, tech leads, and subproject leads) in sigs.yaml are accurate and active, and updated if needed
- Meeting notes and recordings for 2023 are linked from README.md and updated/uploaded if needed
- The meeting notes are typically kept up-to-date and comprehensive. For meeting recordings though we have been a bit sloppy admittedly (trying to improve in 2024)