As an SRE, I need a way to silence alerts from an entire organization (legal entity id) in HyperShift. This currently works in OSD/ROSA, and we need to retain this feature for parity in the HyperShift environment, and its use is already necessary.
Stemming from a discussion about QE clusters being run in production [1] and a need to silence the QE org's cluster from alerting SREs on-call, we attempted to silence the clusters, but learned it doesn't work the same as ROSA. There is not currently a way in HyperShift to silence an entire org by legal entity id, which we need for these QE clusters.
We've had a need for this in ROSA before for other orgs, as well, so it's reasonable to assume we'll need it for more orgs in HyperShift in the future.
NOTE: The discussion about whether or not the testing by QE should be done in production has already been had and a decision made - this ticket is about implementation of the silence feature, and not discussion about if QE should be doing this. Please see the slack thread [1] linked below for more details.
—
As SRE centralized alerting doesn't have access to clusters' orgids, we need a way to add orgs to a list in OCM, which will add a label to the HostedCluster objects of clusters in the silenced org list.
For the label, we can re-use the one we set for limited support, as it will result in alerts for a cluster not paging SRE.
Done Criteria:
- All clusters for a specific legal entity id can have their alerts silenced or otherwise suppressed automatically, to prevent on-call engineers from being alerted.