-
Bug
-
Resolution: Duplicate
-
Normal
-
None
-
4.16.0
-
None
-
No
-
False
-
Description of problem:
kube-rbac-proxy-web container is used for alertmanager-main/prometheus-k8s/thanos-querier pods, but there are too many "write: connection reset by peer" logs in kube-rbac-proxy-web container logs, example
$ oc -n openshift-monitoring get pod -o wide | grep -E "alertmanager-main|prometheus-k8s|thanos-" alertmanager-main-0 6/6 Running 0 6h47m 10.131.0.18 daily-0410-gl552-worker-westus-2rhvv <none> <none> alertmanager-main-1 6/6 Running 0 6h48m 10.129.2.13 daily-0410-gl552-worker-westus-jwhfv <none> <none> prometheus-k8s-0 6/6 Running 0 6h47m 10.128.2.14 daily-0410-gl552-worker-westus-8xq5s <none> <none> prometheus-k8s-1 6/6 Running 0 6h48m 10.129.2.14 daily-0410-gl552-worker-westus-jwhfv <none> <none> thanos-querier-64c467b649-j2rtn 6/6 Running 0 6h49m 10.131.0.15 daily-0410-gl552-worker-westus-2rhvv <none> <none> thanos-querier-64c467b649-xbxff 6/6 Running 0 6h49m 10.128.2.11 daily-0410-gl552-worker-westus-8xq5s <none> <none> $ oc -n openshift-monitoring logs -c kube-rbac-proxy-web alertmanager-main-0 I0410 01:31:42.692369 1 kube-rbac-proxy.go:578] Reading config file: /etc/kube-rbac-proxy/config.yaml I0410 01:31:42.693912 1 kube-rbac-proxy.go:285] Valid token audiences: I0410 01:31:42.694220 1 kube-rbac-proxy.go:399] Reading certificate files I0410 01:31:42.695248 1 kube-rbac-proxy.go:447] Starting TCP socket on 0.0.0.0:9095 I0410 01:31:42.695743 1 kube-rbac-proxy.go:454] Listening securely on 0.0.0.0:9095 I0410 01:32:14.791477 1 log.go:245] http: TLS handshake error from 10.129.2.10:51738: write tcp 10.131.0.18:9095->10.129.2.10:51738: write: connection reset by peer I0410 01:32:19.798007 1 log.go:245] http: TLS handshake error from 10.129.2.10:51744: write tcp 10.131.0.18:9095->10.129.2.10:51744: write: connection reset by peer I0410 01:32:19.806594 1 log.go:245] http: TLS handshake error from 10.128.2.8:48204: write tcp 10.131.0.18:9095->10.128.2.8:48204: write: connection reset by peer I0410 01:32:24.808864 1 log.go:245] http: TLS handshake error from 10.129.2.10:53168: write tcp 10.131.0.18:9095->10.129.2.10:53168: write: connection reset by peer I0410 01:32:24.814942 1 log.go:245] http: TLS handshake error from 10.128.2.8:48218: write tcp 10.131.0.18:9095->10.128.2.8:48218: write: connection reset by peer I0410 01:32:32.218284 1 log.go:245] http: TLS handshake error from 10.129.2.10:47738: write tcp 10.131.0.18:9095->10.129.2.10:47738: write: connection reset by peer I0410 01:32:38.150418 1 log.go:245] http: TLS handshake error from 10.129.2.10:47752: write tcp 10.131.0.18:9095->10.129.2.10:47752: write: connection reset by peer ...
the total count in the pod's logs are
$ oc -n openshift-monitoring logs -c kube-rbac-proxy-web alertmanager-main-0 | grep "write: connection reset by peer" | wc -l 8666 $ oc -n openshift-monitoring logs -c kube-rbac-proxy-web alertmanager-main-1 | grep "write: connection reset by peer" | wc -l 6734 $ oc -n openshift-monitoring logs -c kube-rbac-proxy-web prometheus-k8s-0 | grep "write: connection reset by peer" | wc -l 14203 $ oc -n openshift-monitoring logs -c kube-rbac-proxy-web prometheus-k8s-1 | grep "write: connection reset by peer" | wc -l 13195 $ oc -n openshift-monitoring logs -c kube-rbac-proxy-web thanos-querier-64c467b649-j2rtn | grep "write: connection reset by peer" | wc -l 8704 $ oc -n openshift-monitoring logs -c kube-rbac-proxy-web thanos-querier-64c467b649-xbxff | grep "write: connection reset by peer" | wc -l 7031
as time goes by, the count will be increased
Version-Release number of selected component (if applicable):
$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.16.0-0.nightly-2024-04-08-024331 True False 6h47m Cluster version is 4.16.0-0.nightly-2024-04-08-024331
How reproducible:
always
Steps to Reproduce:
1. check kube-rbac-proxy-web container logs
Actual results:
too many "write: connection reset by peer" logs in kube-rbac-proxy-web container logs
Expected results:
less such logs
Additional info:
the logs does not affect the function
- is caused by
-
OCPBUGS-5916 The kube-rbac-proxy-federate container reporting TLS handshake error
- Closed