Epic Goal

Give operators a more precise and controllable way of communicating their healthiness of an operator

Why is this important?

Currently operator health is defined by the health of OLM's top-level object ClusterServiceVersion which in turn derives it's healthiness / readiness state from the healthiness of all encapsulated components (ServiceAccounts, CRDs, Deployments
the above definition is too coarse and does not provide the ability to reflect operator-specific health states that cannot be not expressed by low-level Kubernetes component health at the operator deployment level (see scenarios)
Expressing health of complex operators via healthiness and readiness probes will lead to undesired side effects, like pods being rescheduled by Kubernetes

An operator may create and track several resources post-deployment that aren't part of its own controller setup but constitute a larger add-on control plane, the overall health of the offering provided by the operator needs to take this into account
An operator may depend on resources outside of the cluster to provide reliable service, the overall health of the offering provided by the operator needs to take availability of these services into account
Cluster administrators expect operators to provide reasonable abstraction for complex, multi-component services like CNV or ODF and thus expect a single, overall health conditions that reports the healthiness of an entire add-on control plane
Cluster fleet administrators expect OpenShift to be able to report a overall health status that includes the aggregate health status of all installed cluster extensions

A operator author must be able to employ custom logic to denote overall operator health and readiness that is not tied to healthiness of the operator controller pods alone

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

is related to

OPRUN-2364 Top-level OLM metrics (and alerts) for over-all operator health and operator upgrade status

relates to

OPRUN-3197 [UPSTREAM] Extension health #390