Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-63441

Kueue Operator metrics are not scraped by CMO Prometheus

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • All
    • None
    • None
    • None
    • None
    • Proposed
    • Bug Fix
    • Fix a bug where kueue metrics were not being exposed to prometheus from 1.1.
    • None
    • None
    • None
    • None

      Description of problem

      During a PerfScale exercise on the Kueue Operator, it was observed that no Kueue-related metrics were available in the monitoring stack. Prometheus was not scraping metrics from the operator’s controller, even though the ServiceMonitor and RBAC resources were successfully created as part of the operator installation.

      Later identified that the ServiceMonitor resource added during the installation is configured with an incorrect port reference, causing Prometheus (via the Cluster Monitoring Operator) to fail in scraping metrics from the Kueue service endpoint.

      Servicemonitor definition,

      ...
        endpoints:
          - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
            interval: 30s
            path: /metrics
            port: metrics
            scheme: https
      ...
      

      Service ports from - kueue-controller-manager-metrics-service.openshift-kueue-operator.svc

      ...
        ports:
        - name: https
          port: 8443
          protocol: TCP
          targetPort: 8443
        selector:
          control-plane: controller-manager
      ...
      

      Upstream helm templates got the right port name - here

      Prerequisites (if any, like setup, operators/versions)

      ROSA 4.19.16
      Red Hat build of Kueue - 1.1.0 provided by Red Hat, Inc

      Steps to Reproduce

      1. Deploy a ROSA cluster
      2. Install Kueue operator from Operator Hub with default config
      3. Check for kueue metrics ex.`kueue_admission_attempts_total`

      Actual results

      None of the Kueue metrics are available in cluster monitoring stack

      Expected results

      Expect all metrics from the operator

      Reproducibility (Always/Intermittent/Only Once)

      Always

      Found in what build

      1.1.0

      Describe any workarounds

      Need a correct service monitor like this - https://gist.github.com/mukrishn/dccd43f5e848deaa04a6d63cbc7080c0

      Additional information

              rh-ee-kehannon Kevin Hannon
              mukrishn@redhat.com Murali Krishnasamy
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: