-
Task
-
Resolution: Done
-
Critical
-
None
-
None
-
False
-
None
-
False
-
MGDOBR - Sprint 224, MGDOBR - Sprint 225
Issue Description:
We need to implement some basic alerting and monitoring for the data plane so that we can try and detect failures when they happen. The purpose of this task is to define a basic strategy for monitoring in our data plane so that we can do basic troubleshooting and alerting for our internal service preview (and beyond).
Acceptance Criteria:
- Define a list of the most basic alerts that we need for the data plane
- Ensure that we have metrics exposed to support these alerts
- Understand how we can use the https://github.com/redhat-developer/observability-operator to capture the monitoring/alerting configuration
- Create the follow-on implementation tasks to get the work done.
Additional Information
- We need to ensure that the observability-operator is deployed as a dependency of our own data plane operator.
- causes
-
MGDOBR-1098 Data Plane Observability
- Closed
- is related to
-
MGDOBR-1075 Review the metrics exposed by Fleet Manager
- Closed
- relates to
-
MGDOBR-1046 Create on-call playbook
- Closed
-
MGDOBR-1039 Get Access to PagerDuty for RHOSE Engineering team
- Closed
-
MGDOBR-1045 Create the on call schedule
- Closed
- mentioned on