-
Bug
-
Resolution: Done
-
Major
-
None
-
4.19
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
None
-
All
-
None
-
None
-
None
-
None
-
Proposed
-
Bug Fix
-
Fix a bug where kueue metrics were not being exposed to prometheus from 1.1.
-
None
-
None
-
None
-
None
Description of problem
During a PerfScale exercise on the Kueue Operator, it was observed that no Kueue-related metrics were available in the monitoring stack. Prometheus was not scraping metrics from the operator’s controller, even though the ServiceMonitor and RBAC resources were successfully created as part of the operator installation.
Later identified that the ServiceMonitor resource added during the installation is configured with an incorrect port reference, causing Prometheus (via the Cluster Monitoring Operator) to fail in scraping metrics from the Kueue service endpoint.
Servicemonitor definition,
...
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 30s
path: /metrics
port: metrics
scheme: https
...
Service ports from - kueue-controller-manager-metrics-service.openshift-kueue-operator.svc
...
ports:
- name: https
port: 8443
protocol: TCP
targetPort: 8443
selector:
control-plane: controller-manager
...
Upstream helm templates got the right port name - here
Prerequisites (if any, like setup, operators/versions)
ROSA 4.19.16
Red Hat build of Kueue - 1.1.0 provided by Red Hat, Inc
Steps to Reproduce
- Deploy a ROSA cluster
- Install Kueue operator from Operator Hub with default config
- Check for kueue metrics ex.`kueue_admission_attempts_total`
Actual results
None of the Kueue metrics are available in cluster monitoring stack
Expected results
Expect all metrics from the operator
Reproducibility (Always/Intermittent/Only Once)
Always
Found in what build
1.1.0
Describe any workarounds
Need a correct service monitor like this - https://gist.github.com/mukrishn/dccd43f5e848deaa04a6d63cbc7080c0