Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-32021

too many "write: connection reset by peer" logs in kube-rbac-proxy-web container logs

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Normal Normal
    • None
    • 4.16.0
    • Monitoring
    • None
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      kube-rbac-proxy-web container is used for alertmanager-main/prometheus-k8s/thanos-querier pods, but there are too many "write: connection reset by peer" logs in kube-rbac-proxy-web container logs, example

      $ oc -n openshift-monitoring get pod -o wide | grep -E "alertmanager-main|prometheus-k8s|thanos-"
      alertmanager-main-0                                      6/6     Running   0          6h47m   10.131.0.18   daily-0410-gl552-worker-westus-2rhvv   <none>           <none>
      alertmanager-main-1                                      6/6     Running   0          6h48m   10.129.2.13   daily-0410-gl552-worker-westus-jwhfv   <none>           <none>
      prometheus-k8s-0                                         6/6     Running   0          6h47m   10.128.2.14   daily-0410-gl552-worker-westus-8xq5s   <none>           <none>
      prometheus-k8s-1                                         6/6     Running   0          6h48m   10.129.2.14   daily-0410-gl552-worker-westus-jwhfv   <none>           <none>
      thanos-querier-64c467b649-j2rtn                          6/6     Running   0          6h49m   10.131.0.15   daily-0410-gl552-worker-westus-2rhvv   <none>           <none>
      thanos-querier-64c467b649-xbxff                          6/6     Running   0          6h49m   10.128.2.11   daily-0410-gl552-worker-westus-8xq5s   <none>           <none>
      
      $ oc -n openshift-monitoring logs -c kube-rbac-proxy-web alertmanager-main-0
      I0410 01:31:42.692369       1 kube-rbac-proxy.go:578] Reading config file: /etc/kube-rbac-proxy/config.yaml
      I0410 01:31:42.693912       1 kube-rbac-proxy.go:285] Valid token audiences: 
      I0410 01:31:42.694220       1 kube-rbac-proxy.go:399] Reading certificate files
      I0410 01:31:42.695248       1 kube-rbac-proxy.go:447] Starting TCP socket on 0.0.0.0:9095
      I0410 01:31:42.695743       1 kube-rbac-proxy.go:454] Listening securely on 0.0.0.0:9095
      I0410 01:32:14.791477       1 log.go:245] http: TLS handshake error from 10.129.2.10:51738: write tcp 10.131.0.18:9095->10.129.2.10:51738: write: connection reset by peer
      I0410 01:32:19.798007       1 log.go:245] http: TLS handshake error from 10.129.2.10:51744: write tcp 10.131.0.18:9095->10.129.2.10:51744: write: connection reset by peer
      I0410 01:32:19.806594       1 log.go:245] http: TLS handshake error from 10.128.2.8:48204: write tcp 10.131.0.18:9095->10.128.2.8:48204: write: connection reset by peer
      I0410 01:32:24.808864       1 log.go:245] http: TLS handshake error from 10.129.2.10:53168: write tcp 10.131.0.18:9095->10.129.2.10:53168: write: connection reset by peer
      I0410 01:32:24.814942       1 log.go:245] http: TLS handshake error from 10.128.2.8:48218: write tcp 10.131.0.18:9095->10.128.2.8:48218: write: connection reset by peer
      I0410 01:32:32.218284       1 log.go:245] http: TLS handshake error from 10.129.2.10:47738: write tcp 10.131.0.18:9095->10.129.2.10:47738: write: connection reset by peer
      I0410 01:32:38.150418       1 log.go:245] http: TLS handshake error from 10.129.2.10:47752: write tcp 10.131.0.18:9095->10.129.2.10:47752: write: connection reset by peer
      ...    

      the total count in the pod's logs are

      $ oc -n openshift-monitoring logs -c kube-rbac-proxy-web alertmanager-main-0 | grep "write: connection reset by peer" | wc -l
      8666
      $ oc -n openshift-monitoring logs -c kube-rbac-proxy-web alertmanager-main-1 | grep "write: connection reset by peer" | wc -l
      6734
      $ oc -n openshift-monitoring logs -c kube-rbac-proxy-web prometheus-k8s-0 | grep "write: connection reset by peer" | wc -l
      14203
      $ oc -n openshift-monitoring logs -c kube-rbac-proxy-web prometheus-k8s-1 | grep "write: connection reset by peer" | wc -l
      13195
      $ oc -n openshift-monitoring logs -c kube-rbac-proxy-web thanos-querier-64c467b649-j2rtn | grep "write: connection reset by peer" | wc -l
      8704
      $ oc -n openshift-monitoring logs -c kube-rbac-proxy-web thanos-querier-64c467b649-xbxff | grep "write: connection reset by peer" | wc -l
      7031
      

      as time goes by, the count will be increased

      Version-Release number of selected component (if applicable):

      $ oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.16.0-0.nightly-2024-04-08-024331   True        False         6h47m   Cluster version is 4.16.0-0.nightly-2024-04-08-024331
      

      How reproducible:

      always

      Steps to Reproduce:

      1. check kube-rbac-proxy-web container logs     

      Actual results:

      too many "write: connection reset by peer" logs in kube-rbac-proxy-web container logs

      Expected results:

      less such logs

      Additional info:

      the logs does not affect the function

            spasquie@redhat.com Simon Pasquier
            juzhao@redhat.com Junqi Zhao
            Junqi Zhao Junqi Zhao
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: