-
Story
-
Resolution: Done
-
Critical
-
None
-
False
-
None
-
False
-
Yes
-
-
-
-
-
-
No
-
No
-
Yes
-
None
-
RHODS 1.15
We send a metric, rhods_aggregate_availability, to Telemetry. We intend to use this metric to track the overall up/down status of RHODS over time.
Currently, this metric tracks availability of all components of RHODS. If any component is down, the metric is marked as down. We don't have visibility into what specific component is down at a given point in time.
Update this metric to include a label indicating the availability of each component over time. At a minimum, we should send values for both the ODH dashboard and the JupyterHub spawner.
We also want to retain the current functionality of aggregating all components into a single up/down value. We can likely accomplish this by having an additional value for the component label that indicates the aggregate of all components.
- is blocked by
-
RHODS-4358 Glitch in rhods_aggregated_availability due to probe_success metrics
-
- Closed
-
- is related to
-
RHODS-4227 rhods_availability_metric could not be accurate to track SLA because of 5 minute threshold
-
- New
-
-
RHODS-4358 Glitch in rhods_aggregated_availability due to probe_success metrics
-
- Closed
-
- relates to
-
RHODS-4753 Offset in rhods_aggregated_availability
-
- New
-
-
RHODS-4752 The rhods_aggregated_availability metric doesn't include the Traefik component.
-
- New
-
-
RHODS-4358 Glitch in rhods_aggregated_availability due to probe_success metrics
-
- Closed
-
- mentioned on