-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.18.z, 4.19, 4.20
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
None
-
Customer Escalated
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
The kube-rbac-proxy-web running in the "alert-manager" pod is showing errors:
$ oc logs alertmanager-main-0 -c kube-rbac-proxy-web |tail -n2 2025-09-25T19:48:18.538959699Z I0925 19:48:18.538885 1 log.go:245] http: TLS handshake error from <IP>:42930: write tcp <IP>:9095-><IP>:42930: write: connection reset by peer 2025-09-25T19:48:19.255619320Z I0925 19:48:19.255539 1 log.go:245] http: TLS handshake error from <IP>:49036: write tcp <IP>:9095-><IP>:49036: write: connection reset by peer
Both alert manager pods are having the same errors. The error is show for 2 ips only, and both are linked to the openshift ingress router pods.
Same as in the issue: https://issues.redhat.com/browse/OCPBUGS-5916
There is no other visible issue for this.
Version-Release number of selected component (if applicable):
OpenShift 4.18.21
How reproducible:
100% of the time
Steps to Reproduce:
1. Spin up a 4.18.21 cluster
2. Check AM's KRP web container.
Business Impact:
As a summary, we use Openshift clusters as a secure technology for Kubernetes cluster (the only one FS validated). This cluster embeds Prometheus and this component generates 300 errors per min. It is up to Red Hat to provide what we are missing with this error (Is Prometheus and resulting monitoring broken?) BTW, we cannot disable Prometheus. We cannot determine if it is blocking something, we might be missing something important, the cluster might be done and Prometheus could not trigger an error. This error is very noisy for log analysis and we could miss something else.
- relates to
-
OCPBUGS-5916 The kube-rbac-proxy-federate container reporting TLS handshake error
-
- Closed
-
-
OCPBUGS-32021 too many "write: connection reset by peer" logs in kube-rbac-proxy-web container logs
-
- Closed
-