-
Spike
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
3
-
False
-
None
-
False
-
-
-
OTA 264
Is the Failing=True status condition is a good indicator that admins should intervene. Can tracking % of clusters with the Failing=True status condition in the whole fleet give us a good indicator of overall success/failure rate for updates or generic (update and non-update) cluster health? Can we ask Webconsole to use the Failing=True status condition as the indicator for whether an upgrade is healthy or needs admin intervention?
Definition of done:
- Check the Failing=True status condition trend chart using superset dashboard and discuss with team.
- Check CI to see how many clusters show Failing=True when there are failure expected as part of the test and discuss with team.
- Create follow of cards to do the work to make tests should fail if they show Failing=True status condition when failures are not expected as part of the test.