Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.20.0
Component/s: Monitoring
Labels:
None

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
MON Sprint 284
sprint_count:
1

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

In an OpenShift bare-metal environment running OpenShift Virtualization with multiple Hosted Control Plane (HCP) clusters, the PrometheusPossibleNarrowSelectors alert triggers repeatedly against the prometheus-k8s monitoring stack.

The alert indicates that Prometheus may be using overly narrow label selectors; however, in this deployment model:

Only default, cluster-wide monitoring components are deployed

No custom Prometheus selectors or queries are configured by the user

Metrics scraping and querying are functioning as expected

The alert appears to be a false positive in this architecture and does not provide actionable remediation.

Version-Release number of selected component (if applicable):{}

OpenShift Container Platform (bare metal)
- 4.20

How reproducible:

No

Steps to Reproduce:

    1.
    2.
    3.

Actual results:

PrometheusPossibleNarrowSelectors alert fires repeatedly

Alert suggests potential misconfiguration of Prometheus selectors

No observable impact to metrics availability or monitoring functionality

Alert creates continuous noise in production monitoring

Expected results:

The alert should not trigger in environments where:
- Selectors are expected and valid due to HCP / virtualization architecture
- No metric loss or scrape failures are occurring
Alternatively:

- The alert logic should be HCP-aware

- Or the alert description/runbook should clearly state that this can be safely ignored in hosted control plane environments

Additional info:

The public runbook for this alert
https://github.com/openshift/runbooks/blob/master/alerts/cluster-monitoring-operator/PrometheusPossibleNarrowSelectors.md
does not currently provide actionable mitigation steps for this scenario.

From the logs, it seems like Prometheus is generating some TLS errors:

$ oc logs prometheus-k8s-0 --all-containers 

2026-02-03T09:01:57.446232128Z level=error ts=2026-02-03T09:01:57.446170604Z caller=runutil.go:117 msg="function failed. Retrying in next tick" err="trigger reload: reload request failed: Post \"http://localhost:9090/-/reload\": dial tcp [::1]:9090: connect: connection refused"
2026-02-03T09:03:00.448152231Z I0203 09:03:00.448105       1 log.go:245] http: TLS handshake error from 10.128.x.x:34650: write tcp 10.128.x.x:9091->10.128.x.x:34650: write: connection reset by peer
2026-02-03T09:03:00.623923024Z I0203 09:03:00.623880       1 log.go:245] http: TLS handshake error from 10.128.x.x:34658: write tcp 10.128.x.x:9091->10.128.x.x:34658: write: connection reset by peer
2026-02-03T09:03:02.960755724Z I0203 09:03:02.960685       1 log.go:245] http: TLS handshake error from 10.128.x.x:52416: write tcp 10.128.x.x:9091->10.128.x.x:52416: write: connection reset by peer
2026-02-03T09:03:03.136280909Z I0203 09:03:03.136232       1 log.go:245] http: TLS handshake error from 10.128.72.2:52430: write tcp 10.128.x.x:9091->10.128.x.x:52430: write: connection reset by peer
2026-02-03T09:03:05.451693105Z I0203 09:03:05.451640       1 log.go:245] http: TLS handshake error from 10.128.x.x:34674: write tcp 10.128.x.x:9091->10.128.x.x:34674: write: connection reset by peer

In the prometheus-user-workload-0 pod logs, I can see the below error message:

$ oc logs prometheus-user-workload-0 -n openshift-user-workload-monitoring -c kube-rbac-proxy-federate

2026-02-02T16:02:40.028260923Z I0202 16:02:40.028098       1 kube-rbac-proxy.go:532] Reading config file: /etc/kube-rbac-proxy/config.yaml
2026-02-02T16:02:40.028744258Z I0202 16:02:40.028726       1 kube-rbac-proxy.go:235] Valid token audiences: 
2026-02-02T16:02:40.029048967Z I0202 16:02:40.028992       1 dynamic_cafile_content.go:161] "Starting controller" name="client-ca::/etc/tls/client/client-ca.crt"
2026-02-02T16:02:40.029612553Z I0202 16:02:40.029588       1 kube-rbac-proxy.go:349] Reading certificate files
2026-02-02T16:02:40.029916962Z I0202 16:02:40.029859       1 kube-rbac-proxy.go:397] Starting TCP socket on 0.0.0.0:9092
2026-02-02T16:02:40.030092003Z I0202 16:02:40.030084       1 kube-rbac-proxy.go:404] Listening securely on 0.0.0.0:9092
2026-02-02T16:03:02.782391572Z I0202 16:03:02.782322       1 log.go:245] http: TLS handshake error from 10.128.x.x:54844: write tcp 10.128.x.x:9092->10.128.x.x:54844: write: connection reset by peer
2026-02-02T16:03:02.782522312Z I0202 16:03:02.782504       1 log.go:245] http: TLS handshake error from 10.128.x.x:38404: write tcp 10.128.x.x:9092->10.128.x.x:38404: write: connection reset by peer
2026-02-02T16:03:07.786593531Z I0202 16:03:07.786529       1 log.go:245] http: TLS handshake error from 10.128.x.x:54850: write tcp 10.128.x.x:9092->10.128.x.x:54850: write: connection reset by peer
2026-02-02T16:03:12.789711007Z I0202 16:03:12.789652       1 log.go:245] http: TLS handshake error from 10.128.x.x:56786: write tcp 10.128.x.x:9092->10.128.x.x:56786: write: connection reset by peer

links to

KCS

Assignee:: Ayoub Mrini

Reporter:: Harshal Thakare

Need Info From:: None

Contributors:: None

QA Contact:: Junqi Zhao

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2026/02/09 5:42 PM

Updated:: 2026/02/23 8:07 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates