-
Spike
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
None
-
False
-
-
False
-
None
-
3
-
None
-
None
Is the Failing=True status condition is a good indicator that admins should intervene. Can tracking % of clusters with the Failing=True status condition in the whole fleet give us a good indicator of overall success/failure rate for updates or generic (update and non-update) cluster health? Can we ask Webconsole to use the Failing=True status condition as the indicator for whether an upgrade is healthy or needs admin intervention?
Definition of done:
- Check the Failing=True status condition trend chart using superset dashboard and discuss with team.
- Check CI to see how many clusters show Failing=True when there are failure expected as part of the test and discuss with team.
- Create follow of cards to do the work to make tests should fail if they show Failing=True status condition when failures are not expected as part of the test.
- is blocked by
-
OTA-362 CI: fail update suite if any ClusterOperator go Available=False
-
- Closed
-
- is related to
-
OTA-1087 Add Upgrade Health section to oc adm upgrade status command
-
- Closed
-
- relates to
-
TRT-1578 Ensure all HA components are not degraded by design during upgrades
-
- New
-
-
OTA-700 Ensure availability of all HA components during upgrades
-
- Closed
-
-
OCPSTRAT-2484 Improve upgrade experience - fix false alarms in ClusterOperator status
-
- In Progress
-