Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-928

KubePodCrashLooping with haproxy container failing with error "Starting proxy health_check_http_url: cannot bind socket [:::50936]"

    XMLWordPrintable

Details

    • Bug
    • Resolution: Duplicate
    • Normal
    • None
    • 4.8.z
    • None
    • Moderate
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:
      haproxy pod on master0 is restarting with error "Starting proxy health_check_http_url: cannot bind socket [:::50936]" about 2k times now.
      This is a openshift on openstack setup, the kube pod inside openshift-openstack-infra is having container haproxy.

      [cloud-user@jumpbox-prod ansible]$ oc describe pod
      haproxy-mpp-nm7wq-master-0 -n openshift-openstack-infra
      Name:                 haproxy-mpp-nm7wq-master-0
      Namespace:            openshift-openstack-infra
      Priority:             2000001000
      Priority Class Name:  system-node-critical
      Node:                 mpp-nm7wq-master-0/172.40.1.28
      Start Time:           Wed, 29 Sep 2021 09:13:15 +0000
      Labels:               app=openstack-infra-api-lb
      Annotations:          kubernetes.io/config.hash: d381520d08c5b3d39c1853a191bb3e96
                            kubernetes.io/config.mirror: d381520d08c5b3d39c1853a191bb3e96
                            kubernetes.io/config.seen: 2022-07-13T09:45:17.595676878Z
                            kubernetes.io/config.source: file
      Status:               Running
      IP:                   172.40.1.28
      IPs:
        IP:           172.40.1.28
      Controlled By:  Node/mpp-nm7wq-master-0
      Init Containers:
        verify-api-int-resolvable:
          Container ID:  cri-o://0fedb09f69ca1efacf7c88c498eb0022dd5163ccb5f2cbe08bd0e41029b6ff8a
          Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8a85f82c666c9355d7c42781249b5f607b19f08a0f68fe4a35e70692c3e67b52
          Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8a85f82c666c9355d7c42781249b5f607b19f08a0f68fe4a35e70692c3e67b52
          Port:          <none>
          Host Port:     <none>
          Command:
            /bin/bash
            -c
            #/bin/bash
            /host/bin/oc --kubeconfig /var/lib/kubelet/kubeconfig get nodes
      
          State:          Terminated
            Reason:       Completed
            Exit Code:    0
            Started:      Wed, 13 Jul 2022 09:45:23 +0000
            Finished:     Wed, 13 Jul 2022 09:45:24 +0000
          Ready:          True
          Restart Count:  10
          Environment:    <none>
          Mounts:
            /host from chroot-host (rw)
            /var/lib/kubelet from kubeconfigvarlib (rw)
      Containers:
        haproxy:
          Container ID:  cri-o://599fca3b3e352a91bf6a909c817ceb4d6cf8e494d6ce49c23ea4a9bcf40708e1
          Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0258dc7269cc67d53c64965e84d558cceb6008ff2906bbfa28eb60ddd3b61376
          Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0258dc7269cc67d53c64965e84d558cceb6008ff2906bbfa28eb60ddd3b61376
          Port:          <none>
          Host Port:     <none>
          Command:
            /bin/bash
            -c
            #/bin/bash
            verify_old_haproxy_ps_being_deleted()
            {
              local prev_pids
      
              prev_pids="$1"
              sleep $OLD_HAPROXY_PS_FORCE_DEL_TIMEOUT
              cur_pids=$(pidof haproxy)
      
              for val in $prev_pids; do
                  if [[ $cur_pids =~ (^|[[:space:]])"$val"($|[[:space:]]) ]] ; then
                     kill $val
                  fi
              done
            }
      
            reload_haproxy()
            {
              old_pids=$(pidof haproxy)
              if [ -n "$old_pids" ]; then
                  /usr/sbin/haproxy -W -db -f /etc/haproxy/haproxy.cfg  -p /var/lib/haproxy/run/haproxy.pid -x /var/lib/haproxy/run/haproxy.sock -sf $old_pids &
                  #There seems to be some cases where HAProxy doesn't drain properly.
                  #To handle that case, SIGTERM signal being sent to old HAProxy processes which haven't terminated.
                  verify_old_haproxy_ps_being_deleted "$old_pids"  &
              else
                  /usr/sbin/haproxy -W -db -f /etc/haproxy/haproxy.cfg  -p /var/lib/haproxy/run/haproxy.pid &
              fi
            }
      
            msg_handler()
            {
              while read -r line; do
                echo "The client send: $line"  >&2
                # currently only 'reload' msg is supported
                if [ "$line" = reload ]; then
                    reload_haproxy
                fi
              done
            }
            set -ex
            declare -r haproxy_sock="/var/run/haproxy/haproxy-master.sock"
            declare -r haproxy_log_sock="/var/run/haproxy/haproxy-log.sock"
            export -f msg_handler
            export -f reload_haproxy
            export -f verify_old_haproxy_ps_being_deleted
            rm -f "$haproxy_sock" "$haproxy_log_sock"
            socat UNIX-RECV:${haproxy_log_sock} STDOUT &
            if [ -s "/etc/haproxy/haproxy.cfg" ]; then
                /usr/sbin/haproxy -W -db -f /etc/haproxy/haproxy.cfg  -p /var/lib/haproxy/run/haproxy.pid &
            fi
            socat UNIX-LISTEN:${haproxy_sock},fork system:'bash -c msg_handler'
      
          State:       Running
            Started:   Tue, 19 Jul 2022 21:24:14 +0000
          Last State:  Terminated
            Reason:    Error
            Message:   + declare -r
      haproxy_sock=/var/run/haproxy/haproxy-master.sock
      + declare -r haproxy_log_sock=/var/run/haproxy/haproxy-log.sock
      + export -f msg_handler
      + export -f reload_haproxy
      + export -f verify_old_haproxy_ps_being_deleted
      + rm -f /var/run/haproxy/haproxy-master.sock
      /var/run/haproxy/haproxy-log.sock
      + '[' -s /etc/haproxy/haproxy.cfg ']'
      + socat UNIX-RECV:/var/run/haproxy/haproxy-log.sock STDOUT
      + socat UNIX-LISTEN:/var/run/haproxy/haproxy-master.sock,fork 'system:bash
      -c msg_handler'
      + /usr/sbin/haproxy -W -db -f /etc/haproxy/haproxy.cfg -p
      /var/lib/haproxy/run/haproxy.pid
      <133>Jul 19 21:17:56 haproxy[10]: Proxy main started.
      [NOTICE] 199/211756 (10) : haproxy version is 2.2.13-5f3eb59
      [NOTICE] 199/211756 (10) : path to executable is /usr/sbin/haproxy
      [ALERT] 199/211756 (10) : Starting proxy health_check_http_url: cannot bind
      socket [:::50936]
      <133>Jul 19 21:17:56 haproxy[10]: Proxy stats started.
      <133>Jul 19 21:17:56 haproxy[10]: Proxy masters started.
      
            Exit Code:    143
            Started:      Tue, 19 Jul 2022 21:17:56 +0000
            Finished:     Tue, 19 Jul 2022 21:19:06 +0000
          Ready:          True
          Restart Count:  2467
          Requests:
            cpu:     100m
            memory:  200Mi
          Liveness:  http-get http://:50936/haproxy_ready delay=50s timeout=1s
      period=10s #success=1 #failure=3
          Environment:
            OLD_HAPROXY_PS_FORCE_DEL_TIMEOUT:  120
          Mounts:
            /etc/haproxy from conf-dir (rw)
            /var/run/haproxy from run-dir (rw)
        haproxy-monitor:
          Container ID:  cri-o://8dbc55cf0736abbf627cea96395029cfdb916174a5dc7480f2aa5998c928b276
          Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8a85f82c666c9355d7c42781249b5f607b19f08a0f68fe4a35e70692c3e67b52
          Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8a85f82c666c9355d7c42781249b5f607b19f08a0f68fe4a35e70692c3e67b52
          Port:          <none>
          Host Port:     <none>
          Command:
            monitor
            /var/lib/kubelet/kubeconfig
            /config/haproxy.cfg.tmpl
            /etc/haproxy/haproxy.cfg
            --api-vip
            172.40.0.5
          State:          Running
            Started:      Wed, 13 Jul 2022 09:45:26 +0000
          Ready:          True
          Restart Count:  3
          Requests:
            cpu:        100m
            memory:     200Mi
          Environment:  <none>
          Mounts:
            /config from resource-dir (rw)
            /etc/haproxy from conf-dir (rw)
            /host from chroot-host (rw)
            /var/lib/kubelet from kubeconfigvarlib (rw)
            /var/run/haproxy from run-dir (rw)
      Conditions:
        Type              Status
        Initialized       True
        Ready             True
        ContainersReady   True
        PodScheduled      True
      Volumes:
        resource-dir:
          Type:          HostPath (bare host directory volume)
          Path:          /etc/kubernetes/static-pod-resources/haproxy
          HostPathType:
        kubeconfigvarlib:
          Type:          HostPath (bare host directory volume)
          Path:          /var/lib/kubelet
          HostPathType:
        run-dir:
          Type:       EmptyDir (a temporary directory that shares a pod's
      lifetime)
          Medium:
          SizeLimit:  <unset>
        conf-dir:
          Type:          HostPath (bare host directory volume)
          Path:          /etc/haproxy
          HostPathType:
        chroot-host:
          Type:          HostPath (bare host directory volume)
          Path:          /
          HostPathType:
      QoS Class:         Burstable
      Node-Selectors:    <none>
      Tolerations:       op=Exists
      Events:            <none>
      

      No other process is interrupting or trying to access port 50936 on master 0.

      $ netstat -plunt | grep 50936
      on master 0
      tcp6 0 0 :::50936 :::* LISTEN 2354586/haproxy 
      tcp6 0 0 :::50936 :::* LISTEN 2354586/haproxy 
      tcp6 0 0 :::50936 :::* LISTEN 2354586/haproxy 
      tcp6 0 0 :::50936 :::* LISTEN 2354586/haproxy 
      tcp6 0 0 :::50936 :::* LISTEN 2354586/haproxy 
      tcp6 0 0 :::50936 :::* LISTEN 2354586/haproxy 
      tcp6 0 0 :::50936 :::* LISTEN 2354586/haproxy 
      on master 1 
      tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy 
      tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy 
      tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy 
      tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy 
      tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy 
      tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy 
      tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy 
      tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy 
      tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy
      

      Version-Release number of selected component (if applicable):
      How reproducible:
      No
      Steps to Reproduce:
      No
      Actual results:
      haproxy pod on other master nodes running properly but not on master 0.

      Expected results:

      Additional info:
      Links consulted but not helpful:
      https://access.redhat.com/solutions/6746671 << Not Working.
      https://access.redhat.com/support/cases/#/case/03126522
      https://bugzilla.redhat.com/show_bug.cgi?id=2012780

      Also not able to run debug pod on that node, though it shows tls error , I think it is related to that.

      Attachments

        Activity

          People

            bnemec@redhat.com Benjamin Nemec
            rhn-support-manyayad Mahesh Nyayadhish
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: