-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
None
-
False
-
None
-
False
-
No
-
-
-
-
-
-
No
-
No
-
Yes
-
None
-
Low
Description of problem:
In two different clusters (4.10 and 4.9), the {job="RHODS Metrics"} query in Prometheus does not return anything.
This means that queries that use the job also fail, e.g. controller_runtime_reconcile_total{controller="kfdef-controller", job="RHODS Metrics", result="success"}
Prerequisites (if any, like setup, operators/versions):
RHODS 1.11 on OSD running OCP 4.9/4.10
Steps to Reproduce
- Install RHODS
- Go to the Prometheus Route given in the metrics namespace
- Try running the query {job="RHODS Metrics"}
Actual results:
No result
Expected results:
Some results, at the very least:
up{instance="10.X.Y.Z:8080", job="RHODS Metrics"}
controller_runtime_reconcile_errors_total
controller_runtime_reconcile_total
rest_client_requests_total
Reproducibility (Always/Intermittent/Only Once):
Always on two different clusters
Build Details:
Workaround:
Additional info:
The root cause seems to be the rhods-operator-metrics service missing from the cluster. aasthana@redhat.com has tried looking into the clusters but could not figure out why the service is not getting deployed.
- links to
-
RHBA-2023:122672 RHODS 2.4 - Red Hat OpenShift Data Science
- mentioned on