Loading...

XML

Word

Printable

Type: Story
Resolution: Done
Priority: Normal
Fix Version/s: None
Affects Version/s: openshift-4.13
Component/s: Cluster Version Operator - Features
Labels:
None

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
None
Story Points:
3

Target Version:
None
Release Blocker:
None
Sprint:
OTA 228, OTA 229

To give users sub-component granularity about why they're getting a critical alert.

We should continue to avoid the cardinality hit of including the full message in the metric, because we don't want to load Prometheus down with that many time-series. For message-level granularity, users still have to follow the oc ... or web-console links from the alert description.

A downside of this approach is that it's possible to have operators with rapidly changing ClusterOperator Available=False reason. But that seems unlikely (it only has to be stable for ~10 minutes before ClusterOperatorDown fires), and we can revisit this approach if it crops up in practice.

links to

openshift/cluster-version-operator#868: OTA-844: pkg/cvo/metrics: Add 'reason' to cluster_operator_up

Assignee:: W. Trevor King

Reporter:: W. Trevor King

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2022/12/07 11:07 PM

Updated:: 2022/12/19 4:09 PM

Resolved:: 2022/12/19 4:09 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates