Details
-
Task
-
Resolution: Done
-
Undefined
-
None
-
None
-
None
-
Monitoring - Sprint 202, Monitoring - Sprint 203, Monitoring - Sprint 204
-
0
Description
Currently we don't leverage a lot the must-gather infrastructure for debugging lower level issues. We have limited coarse grained telemetry data, but a reoccurring theme is to inspect the following:
- A prometheus TSDB dump. This is potentially quite big in size. Maybe we could cap it somehow.
- Get a snapshot of the state for alertmanagers, targets, config, runtimeinfo as per https://bugzilla.redhat.com/show_bug.cgi?id=1845561#c18
DoD:
- improve must-gather debugging insights for Monitoring