-
Epic
-
Resolution: Unresolved
-
Major
-
None
-
None
-
Stability: protect kube-apiserver from harmful LIST
-
Upstream
-
13
-
False
-
False
-
OCPSTRAT-46 - Strategic Upstream Work - OCP Control Plane and Node Lifecycle Group
-
OCPSTRAT-46Strategic Upstream Work - OCP Control Plane and Node Lifecycle Group
-
100% To Do, 0% In Progress, 0% Done
-
XL
Epic Goal
- considerably reduce (temporary) memory footprint of LISTs, down from O(watchers*page-size*object-size*5) to O(watchers*constant), constant around 2 MB.
- reduce etcd load by serving from watch cache
- get a replacement for paginated lists from watch-cache, which is not feasible without major investment
- enforce consistency in the sense of freshness of the returned list
- fix the long-standing "stale reads from the cache" issue, https://github.com/kubernetes/kubernetes/issues/59848
- protect kube-apiserver and its node against list-based OOM attacks
Why is this important?
The kube-apiserver is vulnerable to memory explosion. The issue is apparent in larger clusters, where only a few LIST requests might cause serious disruption. Uncontrolled and unbounded memory consumption of the servers does not only affect clusters that operate in an HA mode but also other programs that share the same machine. In this KEP we propose a potential solution to this issue.
Acceptance Criteria
- TBD
Done Checklist
- CI - CI is running, tests are automated and merged.
- Release Enablement <link to Feature Enablement Presentation>
- DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
- DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
- DEV - Downstream build attached to advisory: <link to errata>
- QE - Test plans in Polarion: <link or reference to Polarion>
- QE - Automated tests merged: <link or reference to automated tests>
- DOC - Downstream documentation merged: <link to meaningful PR>
- blocks
-
OCPSTRAT-1283 [GA] Selectable etcd database size
- Backlog
- incorporates
-
OCPSTRAT-39 Scalability improvements for kube-apiserver [KEP-3157]
- Closed
- is related to
-
API-1456 RFE: Event-spew hardening
- Closed
- relates to
-
OCPBUGS-38523 kube-apiserver resource Spike during data re-encryption at rest
- New