-
Task
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
False
-
None
-
False
-
Testable
-
No
-
No
-
No
-
Pending
-
None
-
-
The RHODS dashboard needs to display metrics from various workloads run by RHODS on behalf of the user:
- models
- ...
(the list is not exhaustive)
We need a single monitoring stack collecting from all those workloads in a single location, so that the Dashboard queries are kept relatively simple.
In that spirit, we need two things:
- Deploy as part of RHODS a metrics collector + storage, dedicated to these "user metrics" => that would be a Prometheus instance
- Document how RHODS component should expose metrics for their workload type
- (Optionnal, bonus): define a standard set of metrics which each workload should expose to facilitate aggregation.