-
Spike
-
Resolution: Done
-
Critical
-
None
-
None
-
False
-
None
-
False
-
-
-
0
-
0
This is the impact statement for the OCPBUGS-29713 series:
Which 4.y.z to 4.y'.z' updates increase vulnerability?
- Any upgrade to versions 4.12.51, 4.13.33 through 4.13.36, 4.14.12 through 4.14.15, and 4.15.0 through 4.15.1
Which types of clusters?
- Clusters where the control plane may not tolerate additional load
What is the impact? Is it serious enough to warrant removing update recommendations?
- This bug causes nodes to report their status every 15 seconds rather than every 5 minutes when no changes are being made
- In a cluster with 15 total nodes and a nominal workload this results in approximately a 20-30% increase in API Server Requests Per Second
- If the control plane is not scaled to tolerate that additional load abnormal behavior may arise, in one observed occurrence Service Account Tokens were not being authorized and pods requiring them crashlooped
How involved is remediation?
- Upgrade to a fixed version, 4.12.53, 4.13.37, 4.14.16, and 4.15.2 or later have resolved issue
- If you do not wish to upgrade you can apply a custom Kubelet config, however since this change also requires a rolling reboot it's preferred to just upgrade instead.
- The kubelet config value to add is `nodeStatusReportFrequency: 5m`
Is this a regression?
- Yes, a Machine Config Operator change meant to avoid configuration changes from triggering an unnecessary reboot caused this value to revert to its default of 15s
- blocks
-
OCPBUGS-29713 Excessive node status updates causing high control plane CPU
- Closed
- links to