-
Story
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
None
-
False
-
None
-
False
-
-
OSDOCS-2427 landed docs recommending cluster admins manually check for firing critical alerts pre-update. I'm not particularly excited about such a blanket policy, because non-OCP components can create their own critical alerts. But I don't mind discussing firing critical alerts as an "in case you're interested..." thing, and letting customer admins decide what to do about that additional context. One strategy that OTA-1272 would unlock would be to have a conditional update risk for every release around the presence of firing critical alerts. This could be at the Cincinnati level, but that might confuse folks who expect conditionalEdges to be about risks where we actually understand some kind of cause-and-effect chain between the measured state and how it might negatively impact a cluster that chooses to update anyway. We could also inject the risk at in the CVO, although that removes our ability to automatically check the risk for existing releases, and it removes our ability to dynamically tune the risk post-release, if we decide it is too annoying. One potential middle ground would be to:
- Have a well-known risk name (GenericAlerts?).
- Have the CVO inject a first-guess rule like {{group (ALERTS
{severity="critical"}
) ...}} so Cincinnati doesn't have to (unless we want to backfill something for older releases via Cincinnati).
- If we decide the CVO's baked-in rule is broken, we could fix it (in new releases) and have Cincinnati serve a better GenericAlerts rule to existing releases.
- If any CVO saw the GenericAlerts rule from Cincinnati for an update, they'd prefer the matcher Cincinnati was recommending over their baked-into-the-CVO rule.
Thoughts?