-
Story
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
False
-
None
-
False
-
NEW
-
NEW
-
-
trking pointed out that the KubeCPUOvercommit alert has a gap.
https://coreos.slack.com/archives/C0VMT03S5/p1671424730231919
It does not take recent addition to the CPU capacity into account. This can lead to a situation where well timed load additions keep this alert firing even if the mitigations via autoscaling work just fine.
This makes the alert noisy.
The first proposal to extend the for clause would only fix one case (2 workload increases 5 minutes apart) but not more. So it would be preferable to improve the alert expression.
To quote Trevor: "if we are within a node's worth of CPU for 10m and the total CPU capacity hasn't increased over that 10m" would be a better trigger.
There are no Sub-Tasks for this issue.