Epic Goal

Have two top level prometheus metrics - one for an aggregated operaor helath and one for an aggregated operator upgrade status. In order to answer the questions: Is my cluster upgrade completed and are my operators heathy?

Why is this important?

As a customer with multiple operators, after installing them, I wonder to monitor the cluster at a whole, without monitoring the individual opertors - to avoid that I need to continuously change my alerting rules around second level operators

As a customer having OCP, ODF/OCS, and CNV deployed I'd like to only monitor the base cluster (CVO) and one additional metric to measure all second level operators health and upgrade status.

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Operators have conditions that expose their health, and we have metrics about upgrade phases, the delta of this epic is: Expose both elements as standard prom metrics in order to feed more data to our consistent prom based alerting

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

relates to

OPRUN-2376 Operator status condition for operator health

RFE-1585 Operators Managed by OLM can report that they are healthy