Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-76419

The PrometheusPossibleNarrowSelectors alert produces false positives in OpenShift Virtualization environments with multiple Hosted Control Planes (HCP)

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.20.0
    • Monitoring
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • MON Sprint 284
    • 1
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      In an OpenShift bare-metal environment running OpenShift Virtualization with multiple Hosted Control Plane (HCP) clusters, the PrometheusPossibleNarrowSelectors alert triggers repeatedly against the prometheus-k8s monitoring stack.

      The alert indicates that Prometheus may be using overly narrow label selectors; however, in this deployment model:

      • Only default, cluster-wide monitoring components are deployed
      • No custom Prometheus selectors or queries are configured by the user
      • Metrics scraping and querying are functioning as expected

      The alert appears to be a false positive in this architecture and does not provide actionable remediation.

      Version-Release number of selected component (if applicable):{}

      • OpenShift Container Platform (bare metal)
           - 4.20 

      How reproducible:

      No    

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

      • PrometheusPossibleNarrowSelectors alert fires repeatedly
      • Alert suggests potential misconfiguration of Prometheus selectors
      • No observable impact to metrics availability or monitoring functionality
      • Alert creates continuous noise in production monitoring

      Expected results:

      • The alert should not trigger in environments where:
                  - Selectors are expected and valid due to HCP / virtualization architecture
                  - No metric loss or scrape failures are occurring
      • Alternatively:

                        - The alert logic should be HCP-aware

                        - Or the alert description/runbook should clearly state that this can be safely ignored in hosted control                       plane environments

      Additional info:

      The public runbook for this alert
      https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/PrometheusPossibleNarrowSelectors.md
      does not currently provide actionable mitigation steps for this scenario.

      From the logs, it seems like Prometheus is generating some TLS errors:

      $ oc logs prometheus-k8s-0 --all-containers 
      
      2026-02-03T09:01:57.446232128Z level=error ts=2026-02-03T09:01:57.446170604Z caller=runutil.go:117 msg="function failed. Retrying in next tick" err="trigger reload: reload request failed: Post \"http://localhost:9090/-/reload\": dial tcp [::1]:9090: connect: connection refused"
      2026-02-03T09:03:00.448152231Z I0203 09:03:00.448105       1 log.go:245] http: TLS handshake error from 10.128.x.x:34650: write tcp 10.128.x.x:9091->10.128.x.x:34650: write: connection reset by peer
      2026-02-03T09:03:00.623923024Z I0203 09:03:00.623880       1 log.go:245] http: TLS handshake error from 10.128.x.x:34658: write tcp 10.128.x.x:9091->10.128.x.x:34658: write: connection reset by peer
      2026-02-03T09:03:02.960755724Z I0203 09:03:02.960685       1 log.go:245] http: TLS handshake error from 10.128.x.x:52416: write tcp 10.128.x.x:9091->10.128.x.x:52416: write: connection reset by peer
      2026-02-03T09:03:03.136280909Z I0203 09:03:03.136232       1 log.go:245] http: TLS handshake error from 10.128.72.2:52430: write tcp 10.128.x.x:9091->10.128.x.x:52430: write: connection reset by peer
      2026-02-03T09:03:05.451693105Z I0203 09:03:05.451640       1 log.go:245] http: TLS handshake error from 10.128.x.x:34674: write tcp 10.128.x.x:9091->10.128.x.x:34674: write: connection reset by peer
      

      In the prometheus-user-workload-0 pod logs, I can see the below error message:

      $ oc logs prometheus-user-workload-0 -n openshift-user-workload-monitoring -c kube-rbac-proxy-federate
      
      2026-02-02T16:02:40.028260923Z I0202 16:02:40.028098       1 kube-rbac-proxy.go:532] Reading config file: /etc/kube-rbac-proxy/config.yaml
      2026-02-02T16:02:40.028744258Z I0202 16:02:40.028726       1 kube-rbac-proxy.go:235] Valid token audiences: 
      2026-02-02T16:02:40.029048967Z I0202 16:02:40.028992       1 dynamic_cafile_content.go:161] "Starting controller" name="client-ca::/etc/tls/client/client-ca.crt"
      2026-02-02T16:02:40.029612553Z I0202 16:02:40.029588       1 kube-rbac-proxy.go:349] Reading certificate files
      2026-02-02T16:02:40.029916962Z I0202 16:02:40.029859       1 kube-rbac-proxy.go:397] Starting TCP socket on 0.0.0.0:9092
      2026-02-02T16:02:40.030092003Z I0202 16:02:40.030084       1 kube-rbac-proxy.go:404] Listening securely on 0.0.0.0:9092
      2026-02-02T16:03:02.782391572Z I0202 16:03:02.782322       1 log.go:245] http: TLS handshake error from 10.128.x.x:54844: write tcp 10.128.x.x:9092->10.128.x.x:54844: write: connection reset by peer
      2026-02-02T16:03:02.782522312Z I0202 16:03:02.782504       1 log.go:245] http: TLS handshake error from 10.128.x.x:38404: write tcp 10.128.x.x:9092->10.128.x.x:38404: write: connection reset by peer
      2026-02-02T16:03:07.786593531Z I0202 16:03:07.786529       1 log.go:245] http: TLS handshake error from 10.128.x.x:54850: write tcp 10.128.x.x:9092->10.128.x.x:54850: write: connection reset by peer
      2026-02-02T16:03:12.789711007Z I0202 16:03:12.789652       1 log.go:245] http: TLS handshake error from 10.128.x.x:56786: write tcp 10.128.x.x:9092->10.128.x.x:56786: write: connection reset by peer 

       

              rh-ee-amrini Ayoub Mrini
              rhn-support-hthakare Harshal Thakare
              None
              None
              Junqi Zhao Junqi Zhao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: