[MCO-1094] Impact Excessive node status updates causing high control plane CPU - Red Hat Issue Tracker

This is the impact statement for the ~~OCPBUGS-29713~~ series:

Which 4.y.z to 4.y'.z' updates increase vulnerability?

Any upgrade to versions 4.12.51, 4.13.33 through 4.13.36, 4.14.12 through 4.14.15, and 4.15.0 through 4.15.1

This bug causes nodes to report their status every 15 seconds rather than every 5 minutes when no changes are being made
In a cluster with 15 total nodes and a nominal workload this results in approximately a 20-30% increase in API Server Requests Per Second
If the control plane is not scaled to tolerate that additional load abnormal behavior may arise, in one observed occurrence Service Account Tokens were not being authorized and pods requiring them crashlooped

Upgrade to a fixed version, 4.12.53, 4.13.37, 4.14.16, and 4.15.2 or later have resolved issue
If you do not wish to upgrade you can apply a custom Kubelet config, however since this change also requires a rolling reboot it's preferred to just upgrade instead.
- The kubelet config value to add is `nodeStatusReportFrequency: 5m`

Yes, a Machine Config Operator change meant to avoid configuration changes from triggering an unnecessary reboot caused this value to revert to its default of 15s

blocks

OCPBUGS-29713 Excessive node status updates causing high control plane CPU

links to