-
Feature
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
BU Product Work
-
False
-
-
False
-
75% To Do, 0% In Progress, 25% Done
-
0
-
Backlog Refinement
Pod Lifecycle Event Generator (PLEG)
In Kubernetes, Kubelet is a per-node daemon that manages the pods on the node, driving the pod states to match their pod specifications (specs). To achieve this, Kubelet needs to react to changes in both (1) pod specs and (2) the container states. For the former, Kubelet watches the pod specs changes from multiple sources; for the latter, Kubelet polls the container runtime periodically (e.g., 10s) for the latest states for all containers.
Polling incurs non-negligible overhead as the number of pods/containers increases, and is exacerbated by Kubelet's parallelism – one worker (goroutine) per pod, which queries the container runtime individually. Periodic, concurrent, large number of requests causes high CPU usage spikes (even when there is no spec/state change), poor performance, and reliability problems due to overwhelmed container runtime. Ultimately, it limits Kubelet's scalability.
(Related issues reported by users: #10451, #12099, #12082)
Goals and Requirements
The goal of this proposal is to improve Kubelet's scalability and performance by lowering the pod management overhead.
- Reduce unnecessary work during inactivity (no spec/state changes)
- Lower the concurrent requests to the container runtime.