Loading...

XML

Word

Printable

Type: Story
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- no-epic

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Docs QE Status:
NEW
QE Status:
NEW
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

trking pointed out that the KubeCPUOvercommit alert has a gap.

https://coreos.slack.com/archives/C0VMT03S5/p1671424730231919

It does not take recent addition to the CPU capacity into account. This can lead to a situation where well timed load additions keep this alert firing even if the mitigations via autoscaling work just fine.
This makes the alert noisy.

The first proposal to extend the for clause would only fix one case (2 workload increases 5 minutes apart) but not more. So it would be preferable to improve the alert expression.

To quote Trevor: "if we are within a node's worth of CPU for 10m and the total CPU capacity hasn't increased over that 10m" would be a better trigger.

There are no Sub-Tasks for this issue.

Assignee:: Ayoub Mrini

Reporter:: Jan Fajerski

QA Contact:: Junqi Zhao

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2022/12/20 9:11 AM

Updated:: 2025/08/11 9:49 AM

Details

Description

Attachments

Easy Agile Planning Poker

Sub-Tasks

Activity

People

Dates