Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.16.0
Component/s: Monitoring
Labels:
None

Regression:
No
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

kube-rbac-proxy-web container is used for alertmanager-main/prometheus-k8s/thanos-querier pods, but there are too many "write: connection reset by peer" logs in kube-rbac-proxy-web container logs, example

$ oc -n openshift-monitoring get pod -o wide | grep -E "alertmanager-main|prometheus-k8s|thanos-"
alertmanager-main-0                                      6/6     Running   0          6h47m   10.131.0.18   daily-0410-gl552-worker-westus-2rhvv   <none>           <none>
alertmanager-main-1                                      6/6     Running   0          6h48m   10.129.2.13   daily-0410-gl552-worker-westus-jwhfv   <none>           <none>
prometheus-k8s-0                                         6/6     Running   0          6h47m   10.128.2.14   daily-0410-gl552-worker-westus-8xq5s   <none>           <none>
prometheus-k8s-1                                         6/6     Running   0          6h48m   10.129.2.14   daily-0410-gl552-worker-westus-jwhfv   <none>           <none>
thanos-querier-64c467b649-j2rtn                          6/6     Running   0          6h49m   10.131.0.15   daily-0410-gl552-worker-westus-2rhvv   <none>           <none>
thanos-querier-64c467b649-xbxff                          6/6     Running   0          6h49m   10.128.2.11   daily-0410-gl552-worker-westus-8xq5s   <none>           <none>

$ oc -n openshift-monitoring logs -c kube-rbac-proxy-web alertmanager-main-0
I0410 01:31:42.692369       1 kube-rbac-proxy.go:578] Reading config file: /etc/kube-rbac-proxy/config.yaml
I0410 01:31:42.693912       1 kube-rbac-proxy.go:285] Valid token audiences: 
I0410 01:31:42.694220       1 kube-rbac-proxy.go:399] Reading certificate files
I0410 01:31:42.695248       1 kube-rbac-proxy.go:447] Starting TCP socket on 0.0.0.0:9095
I0410 01:31:42.695743       1 kube-rbac-proxy.go:454] Listening securely on 0.0.0.0:9095
I0410 01:32:14.791477       1 log.go:245] http: TLS handshake error from 10.129.2.10:51738: write tcp 10.131.0.18:9095->10.129.2.10:51738: write: connection reset by peer
I0410 01:32:19.798007       1 log.go:245] http: TLS handshake error from 10.129.2.10:51744: write tcp 10.131.0.18:9095->10.129.2.10:51744: write: connection reset by peer
I0410 01:32:19.806594       1 log.go:245] http: TLS handshake error from 10.128.2.8:48204: write tcp 10.131.0.18:9095->10.128.2.8:48204: write: connection reset by peer
I0410 01:32:24.808864       1 log.go:245] http: TLS handshake error from 10.129.2.10:53168: write tcp 10.131.0.18:9095->10.129.2.10:53168: write: connection reset by peer
I0410 01:32:24.814942       1 log.go:245] http: TLS handshake error from 10.128.2.8:48218: write tcp 10.131.0.18:9095->10.128.2.8:48218: write: connection reset by peer
I0410 01:32:32.218284       1 log.go:245] http: TLS handshake error from 10.129.2.10:47738: write tcp 10.131.0.18:9095->10.129.2.10:47738: write: connection reset by peer
I0410 01:32:38.150418       1 log.go:245] http: TLS handshake error from 10.129.2.10:47752: write tcp 10.131.0.18:9095->10.129.2.10:47752: write: connection reset by peer
...

the total count in the pod's logs are

$ oc -n openshift-monitoring logs -c kube-rbac-proxy-web alertmanager-main-0 | grep "write: connection reset by peer" | wc -l
8666
$ oc -n openshift-monitoring logs -c kube-rbac-proxy-web alertmanager-main-1 | grep "write: connection reset by peer" | wc -l
6734
$ oc -n openshift-monitoring logs -c kube-rbac-proxy-web prometheus-k8s-0 | grep "write: connection reset by peer" | wc -l
14203
$ oc -n openshift-monitoring logs -c kube-rbac-proxy-web prometheus-k8s-1 | grep "write: connection reset by peer" | wc -l
13195
$ oc -n openshift-monitoring logs -c kube-rbac-proxy-web thanos-querier-64c467b649-j2rtn | grep "write: connection reset by peer" | wc -l
8704
$ oc -n openshift-monitoring logs -c kube-rbac-proxy-web thanos-querier-64c467b649-xbxff | grep "write: connection reset by peer" | wc -l
7031

as time goes by, the count will be increased

Version-Release number of selected component (if applicable):

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-0.nightly-2024-04-08-024331   True        False         6h47m   Cluster version is 4.16.0-0.nightly-2024-04-08-024331

How reproducible:

always

Steps to Reproduce:

1. check kube-rbac-proxy-web container logs

Actual results:

too many "write: connection reset by peer" logs in kube-rbac-proxy-web container logs

Expected results:

less such logs

Additional info:

the logs does not affect the function

is caused by

OCPBUGS-5916 The kube-rbac-proxy-federate container reporting TLS handshake error

Closed

Assignee:: Simon Pasquier

Reporter:: Junqi Zhao

QA Contact:: Junqi Zhao

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2024/04/10 8:45 AM

Updated:: 2024/04/11 12:05 PM

Resolved:: 2024/04/11 12:05 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates