-
Epic
-
Resolution: Unresolved
-
Major
-
ACM 2.12.0
-
None
Epic Goal
The base is the add-on status reporting that reports whether the current workloads and configurations are deployed.
But having the workload deployed doesn't mean that the runtime it working as expected. For metrics:
We alert when the Prometheus Agent fails to send samples to the hub.
We can't say however when the Agent fails to federate metrics.
https://issues.redhat.com/browse/ACM-26852
Instead of a black box node pinger, I would like to create a rule on the hub that is easier to implement. Cf
https://issues.redhat.com/browse/ACM-26514.
It's less reactive, but functionally it's the same. Maybe we can also monitor signals provided by OCM, like "unreachable" cluster.
...
Why is this important?
...
Scenarios
...
Acceptance Criteria
...
Dependencies (internal and external)
- ...
Previous Work (Optional):
- ...
Open questions:
- …
Done Checklist
- CI - CI is running, tests are automated and merged.
- Release Enablement <link to Feature Enablement Presentation>
- DEV - Upstream code and tests merged: <link to meaningful PR or GitHub
Issue> - DEV - Upstream documentation merged: <link to meaningful PR or GitHub
Issue> - DEV - Downstream build attached to advisory: <link to errata>
- QE - Test plans in Polarion: <link or reference to Polarion>
- QE - Automated tests merged: <link or reference to automated tests>
- DOC - Doc issue opened with a completed template. Separate doc issue
opened for any deprecation, removal, or any current known
issue/troubleshooting removal from the doc, if applicable.