Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-36500

Platform Prometheus unable to discover targets from openshift-operator-lifecycle-manager

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • None
    • 4.17
    • OLM
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • Done

      Description of problem:

      This was discovered by a new alert that was reverted in https://issues.redhat.com/browse/OCPBUGS-36299 as the issue is making Hypershift Conformance fail.

      Platform prometheus is asked to scrape targets from the namespace "openshift-operator-lifecycle-manager", but Prometheus isn't given the appropriate RBAC to do so.

      The alert was revealing an RBAC issue on platform Prometheus: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-hypershift-release-4.17-periodics-e2e-aws-ovn-conformance/1806305841511403520/artifacts/e2e-aws-ovn-conformance/dump/artifacts/hostedcluster-8a4fd7515fb581e231c4/namespaces/openshift-monitoring/pods/prometheus-k8s-0/prometheus/prometheus/logs/current.log

      2024-06-27T14:59:38.968032082Z ts=2024-06-27T14:59:38.967Z caller=klog.go:108 level=warn component=k8s_client_runtime func=Warningf msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:554: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"openshift-operator-lifecycle-manager\""
      2024-06-27T14:59:38.968032082Z ts=2024-06-27T14:59:38.968Z caller=klog.go:116 level=error component=k8s_client_runtime func=ErrorDepth msg="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:554: Failed to watch *v1.Endpoints: failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:openshift-monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"openshift-operator-lifecycle-manager\""
      

      Before adding this alert, such issues went unnoticed.

      https://docs.google.com/document/d/1rCKAYTrYMESjJDyJ0KvNap05NNukmNXVVi6MShISnOw/edit#heading=h.13rhihr867kk explains what should be done (cf the "Also, in order for Prometheus to be able to discover..." paragraph) in order to make Prometheus able to discover the targets.

      Because no test was failing before, maybe the metrics from "openshift-operator-lifecycle-manager" are not needed and we should stop asking Prometheus to discover targets from there: delete the ServiceMonitor/PodMonitor

      Expected results:

            ankithom Ankita Thomas
            rh-ee-amrini Ayoub Mrini
            Jian Zhang Jian Zhang
            Alex Dellapenta Alex Dellapenta
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: