Loading...

Type: Bug
Resolution: Duplicate
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.8.z
Component/s: Networking / runtime-cfg
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Priority Data:
PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:
haproxy pod on master0 is restarting with error "Starting proxy health_check_http_url: cannot bind socket [:::50936]" about 2k times now.
This is a openshift on openstack setup, the kube pod inside openshift-openstack-infra is having container haproxy.

[cloud-user@jumpbox-prod ansible]$ oc describe pod
haproxy-mpp-nm7wq-master-0 -n openshift-openstack-infra
Name:                 haproxy-mpp-nm7wq-master-0
Namespace:            openshift-openstack-infra
Priority:             2000001000
Priority Class Name:  system-node-critical
Node:                 mpp-nm7wq-master-0/172.40.1.28
Start Time:           Wed, 29 Sep 2021 09:13:15 +0000
Labels:               app=openstack-infra-api-lb
Annotations:          kubernetes.io/config.hash: d381520d08c5b3d39c1853a191bb3e96
                      kubernetes.io/config.mirror: d381520d08c5b3d39c1853a191bb3e96
                      kubernetes.io/config.seen: 2022-07-13T09:45:17.595676878Z
                      kubernetes.io/config.source: file
Status:               Running
IP:                   172.40.1.28
IPs:
  IP:           172.40.1.28
Controlled By:  Node/mpp-nm7wq-master-0
Init Containers:
  verify-api-int-resolvable:
    Container ID:  cri-o://0fedb09f69ca1efacf7c88c498eb0022dd5163ccb5f2cbe08bd0e41029b6ff8a
    Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8a85f82c666c9355d7c42781249b5f607b19f08a0f68fe4a35e70692c3e67b52
    Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8a85f82c666c9355d7c42781249b5f607b19f08a0f68fe4a35e70692c3e67b52
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/bash
      -c
      #/bin/bash
      /host/bin/oc --kubeconfig /var/lib/kubelet/kubeconfig get nodes

    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Wed, 13 Jul 2022 09:45:23 +0000
      Finished:     Wed, 13 Jul 2022 09:45:24 +0000
    Ready:          True
    Restart Count:  10
    Environment:    <none>
    Mounts:
      /host from chroot-host (rw)
      /var/lib/kubelet from kubeconfigvarlib (rw)
Containers:
  haproxy:
    Container ID:  cri-o://599fca3b3e352a91bf6a909c817ceb4d6cf8e494d6ce49c23ea4a9bcf40708e1
    Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0258dc7269cc67d53c64965e84d558cceb6008ff2906bbfa28eb60ddd3b61376
    Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0258dc7269cc67d53c64965e84d558cceb6008ff2906bbfa28eb60ddd3b61376
    Port:          <none>
    Host Port:     <none>
    Command:
      /bin/bash
      -c
      #/bin/bash
      verify_old_haproxy_ps_being_deleted()
      {
        local prev_pids

        prev_pids="$1"
        sleep $OLD_HAPROXY_PS_FORCE_DEL_TIMEOUT
        cur_pids=$(pidof haproxy)

        for val in $prev_pids; do
            if [[ $cur_pids =~ (^|[[:space:]])"$val"($|[[:space:]]) ]] ; then
               kill $val
            fi
        done
      }

      reload_haproxy()
      {
        old_pids=$(pidof haproxy)
        if [ -n "$old_pids" ]; then
            /usr/sbin/haproxy -W -db -f /etc/haproxy/haproxy.cfg  -p /var/lib/haproxy/run/haproxy.pid -x /var/lib/haproxy/run/haproxy.sock -sf $old_pids &
            #There seems to be some cases where HAProxy doesn't drain properly.
            #To handle that case, SIGTERM signal being sent to old HAProxy processes which haven't terminated.
            verify_old_haproxy_ps_being_deleted "$old_pids"  &
        else
            /usr/sbin/haproxy -W -db -f /etc/haproxy/haproxy.cfg  -p /var/lib/haproxy/run/haproxy.pid &
        fi
      }

      msg_handler()
      {
        while read -r line; do
          echo "The client send: $line"  >&2
          # currently only 'reload' msg is supported
          if [ "$line" = reload ]; then
              reload_haproxy
          fi
        done
      }
      set -ex
      declare -r haproxy_sock="/var/run/haproxy/haproxy-master.sock"
      declare -r haproxy_log_sock="/var/run/haproxy/haproxy-log.sock"
      export -f msg_handler
      export -f reload_haproxy
      export -f verify_old_haproxy_ps_being_deleted
      rm -f "$haproxy_sock" "$haproxy_log_sock"
      socat UNIX-RECV:${haproxy_log_sock} STDOUT &
      if [ -s "/etc/haproxy/haproxy.cfg" ]; then
          /usr/sbin/haproxy -W -db -f /etc/haproxy/haproxy.cfg  -p /var/lib/haproxy/run/haproxy.pid &
      fi
      socat UNIX-LISTEN:${haproxy_sock},fork system:'bash -c msg_handler'

    State:       Running
      Started:   Tue, 19 Jul 2022 21:24:14 +0000
    Last State:  Terminated
      Reason:    Error
      Message:   + declare -r
haproxy_sock=/var/run/haproxy/haproxy-master.sock
+ declare -r haproxy_log_sock=/var/run/haproxy/haproxy-log.sock
+ export -f msg_handler
+ export -f reload_haproxy
+ export -f verify_old_haproxy_ps_being_deleted
+ rm -f /var/run/haproxy/haproxy-master.sock
/var/run/haproxy/haproxy-log.sock
+ '[' -s /etc/haproxy/haproxy.cfg ']'
+ socat UNIX-RECV:/var/run/haproxy/haproxy-log.sock STDOUT
+ socat UNIX-LISTEN:/var/run/haproxy/haproxy-master.sock,fork 'system:bash
-c msg_handler'
+ /usr/sbin/haproxy -W -db -f /etc/haproxy/haproxy.cfg -p
/var/lib/haproxy/run/haproxy.pid
<133>Jul 19 21:17:56 haproxy[10]: Proxy main started.
[NOTICE] 199/211756 (10) : haproxy version is 2.2.13-5f3eb59
[NOTICE] 199/211756 (10) : path to executable is /usr/sbin/haproxy
[ALERT] 199/211756 (10) : Starting proxy health_check_http_url: cannot bind
socket [:::50936]
<133>Jul 19 21:17:56 haproxy[10]: Proxy stats started.
<133>Jul 19 21:17:56 haproxy[10]: Proxy masters started.

      Exit Code:    143
      Started:      Tue, 19 Jul 2022 21:17:56 +0000
      Finished:     Tue, 19 Jul 2022 21:19:06 +0000
    Ready:          True
    Restart Count:  2467
    Requests:
      cpu:     100m
      memory:  200Mi
    Liveness:  http-get http://:50936/haproxy_ready delay=50s timeout=1s
period=10s #success=1 #failure=3
    Environment:
      OLD_HAPROXY_PS_FORCE_DEL_TIMEOUT:  120
    Mounts:
      /etc/haproxy from conf-dir (rw)
      /var/run/haproxy from run-dir (rw)
  haproxy-monitor:
    Container ID:  cri-o://8dbc55cf0736abbf627cea96395029cfdb916174a5dc7480f2aa5998c928b276
    Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8a85f82c666c9355d7c42781249b5f607b19f08a0f68fe4a35e70692c3e67b52
    Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8a85f82c666c9355d7c42781249b5f607b19f08a0f68fe4a35e70692c3e67b52
    Port:          <none>
    Host Port:     <none>
    Command:
      monitor
      /var/lib/kubelet/kubeconfig
      /config/haproxy.cfg.tmpl
      /etc/haproxy/haproxy.cfg
      --api-vip
      172.40.0.5
    State:          Running
      Started:      Wed, 13 Jul 2022 09:45:26 +0000
    Ready:          True
    Restart Count:  3
    Requests:
      cpu:        100m
      memory:     200Mi
    Environment:  <none>
    Mounts:
      /config from resource-dir (rw)
      /etc/haproxy from conf-dir (rw)
      /host from chroot-host (rw)
      /var/lib/kubelet from kubeconfigvarlib (rw)
      /var/run/haproxy from run-dir (rw)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  resource-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/kubernetes/static-pod-resources/haproxy
    HostPathType:
  kubeconfigvarlib:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/kubelet
    HostPathType:
  run-dir:
    Type:       EmptyDir (a temporary directory that shares a pod's
lifetime)
    Medium:
    SizeLimit:  <unset>
  conf-dir:
    Type:          HostPath (bare host directory volume)
    Path:          /etc/haproxy
    HostPathType:
  chroot-host:
    Type:          HostPath (bare host directory volume)
    Path:          /
    HostPathType:
QoS Class:         Burstable
Node-Selectors:    <none>
Tolerations:       op=Exists
Events:            <none>

No other process is interrupting or trying to access port 50936 on master 0.

$ netstat -plunt | grep 50936
on master 0
tcp6 0 0 :::50936 :::* LISTEN 2354586/haproxy 
tcp6 0 0 :::50936 :::* LISTEN 2354586/haproxy 
tcp6 0 0 :::50936 :::* LISTEN 2354586/haproxy 
tcp6 0 0 :::50936 :::* LISTEN 2354586/haproxy 
tcp6 0 0 :::50936 :::* LISTEN 2354586/haproxy 
tcp6 0 0 :::50936 :::* LISTEN 2354586/haproxy 
tcp6 0 0 :::50936 :::* LISTEN 2354586/haproxy 
on master 1 
tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy 
tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy 
tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy 
tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy 
tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy 
tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy 
tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy 
tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy 
tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy

Version-Release number of selected component (if applicable):
How reproducible:
No
Steps to Reproduce:
No
Actual results:
haproxy pod on other master nodes running properly but not on master 0.

Expected results:

Additional info:
Links consulted but not helpful:
https://access.redhat.com/solutions/6746671 << Not Working.
https://access.redhat.com/support/cases/#/case/03126522
https://bugzilla.redhat.com/show_bug.cgi?id=2012780

Also not able to run debug pod on that node, though it shows tls error , I think it is related to that.

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

Hide