-
Bug
-
Resolution: Duplicate
-
Normal
-
None
-
4.8.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
haproxy pod on master0 is restarting with error "Starting proxy health_check_http_url: cannot bind socket [:::50936]" about 2k times now.
This is a openshift on openstack setup, the kube pod inside openshift-openstack-infra is having container haproxy.
[cloud-user@jumpbox-prod ansible]$ oc describe pod haproxy-mpp-nm7wq-master-0 -n openshift-openstack-infra Name: haproxy-mpp-nm7wq-master-0 Namespace: openshift-openstack-infra Priority: 2000001000 Priority Class Name: system-node-critical Node: mpp-nm7wq-master-0/172.40.1.28 Start Time: Wed, 29 Sep 2021 09:13:15 +0000 Labels: app=openstack-infra-api-lb Annotations: kubernetes.io/config.hash: d381520d08c5b3d39c1853a191bb3e96 kubernetes.io/config.mirror: d381520d08c5b3d39c1853a191bb3e96 kubernetes.io/config.seen: 2022-07-13T09:45:17.595676878Z kubernetes.io/config.source: file Status: Running IP: 172.40.1.28 IPs: IP: 172.40.1.28 Controlled By: Node/mpp-nm7wq-master-0 Init Containers: verify-api-int-resolvable: Container ID: cri-o://0fedb09f69ca1efacf7c88c498eb0022dd5163ccb5f2cbe08bd0e41029b6ff8a Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8a85f82c666c9355d7c42781249b5f607b19f08a0f68fe4a35e70692c3e67b52 Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8a85f82c666c9355d7c42781249b5f607b19f08a0f68fe4a35e70692c3e67b52 Port: <none> Host Port: <none> Command: /bin/bash -c #/bin/bash /host/bin/oc --kubeconfig /var/lib/kubelet/kubeconfig get nodes State: Terminated Reason: Completed Exit Code: 0 Started: Wed, 13 Jul 2022 09:45:23 +0000 Finished: Wed, 13 Jul 2022 09:45:24 +0000 Ready: True Restart Count: 10 Environment: <none> Mounts: /host from chroot-host (rw) /var/lib/kubelet from kubeconfigvarlib (rw) Containers: haproxy: Container ID: cri-o://599fca3b3e352a91bf6a909c817ceb4d6cf8e494d6ce49c23ea4a9bcf40708e1 Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0258dc7269cc67d53c64965e84d558cceb6008ff2906bbfa28eb60ddd3b61376 Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0258dc7269cc67d53c64965e84d558cceb6008ff2906bbfa28eb60ddd3b61376 Port: <none> Host Port: <none> Command: /bin/bash -c #/bin/bash verify_old_haproxy_ps_being_deleted() { local prev_pids prev_pids="$1" sleep $OLD_HAPROXY_PS_FORCE_DEL_TIMEOUT cur_pids=$(pidof haproxy) for val in $prev_pids; do if [[ $cur_pids =~ (^|[[:space:]])"$val"($|[[:space:]]) ]] ; then kill $val fi done } reload_haproxy() { old_pids=$(pidof haproxy) if [ -n "$old_pids" ]; then /usr/sbin/haproxy -W -db -f /etc/haproxy/haproxy.cfg -p /var/lib/haproxy/run/haproxy.pid -x /var/lib/haproxy/run/haproxy.sock -sf $old_pids & #There seems to be some cases where HAProxy doesn't drain properly. #To handle that case, SIGTERM signal being sent to old HAProxy processes which haven't terminated. verify_old_haproxy_ps_being_deleted "$old_pids" & else /usr/sbin/haproxy -W -db -f /etc/haproxy/haproxy.cfg -p /var/lib/haproxy/run/haproxy.pid & fi } msg_handler() { while read -r line; do echo "The client send: $line" >&2 # currently only 'reload' msg is supported if [ "$line" = reload ]; then reload_haproxy fi done } set -ex declare -r haproxy_sock="/var/run/haproxy/haproxy-master.sock" declare -r haproxy_log_sock="/var/run/haproxy/haproxy-log.sock" export -f msg_handler export -f reload_haproxy export -f verify_old_haproxy_ps_being_deleted rm -f "$haproxy_sock" "$haproxy_log_sock" socat UNIX-RECV:${haproxy_log_sock} STDOUT & if [ -s "/etc/haproxy/haproxy.cfg" ]; then /usr/sbin/haproxy -W -db -f /etc/haproxy/haproxy.cfg -p /var/lib/haproxy/run/haproxy.pid & fi socat UNIX-LISTEN:${haproxy_sock},fork system:'bash -c msg_handler' State: Running Started: Tue, 19 Jul 2022 21:24:14 +0000 Last State: Terminated Reason: Error Message: + declare -r haproxy_sock=/var/run/haproxy/haproxy-master.sock + declare -r haproxy_log_sock=/var/run/haproxy/haproxy-log.sock + export -f msg_handler + export -f reload_haproxy + export -f verify_old_haproxy_ps_being_deleted + rm -f /var/run/haproxy/haproxy-master.sock /var/run/haproxy/haproxy-log.sock + '[' -s /etc/haproxy/haproxy.cfg ']' + socat UNIX-RECV:/var/run/haproxy/haproxy-log.sock STDOUT + socat UNIX-LISTEN:/var/run/haproxy/haproxy-master.sock,fork 'system:bash -c msg_handler' + /usr/sbin/haproxy -W -db -f /etc/haproxy/haproxy.cfg -p /var/lib/haproxy/run/haproxy.pid <133>Jul 19 21:17:56 haproxy[10]: Proxy main started. [NOTICE] 199/211756 (10) : haproxy version is 2.2.13-5f3eb59 [NOTICE] 199/211756 (10) : path to executable is /usr/sbin/haproxy [ALERT] 199/211756 (10) : Starting proxy health_check_http_url: cannot bind socket [:::50936] <133>Jul 19 21:17:56 haproxy[10]: Proxy stats started. <133>Jul 19 21:17:56 haproxy[10]: Proxy masters started. Exit Code: 143 Started: Tue, 19 Jul 2022 21:17:56 +0000 Finished: Tue, 19 Jul 2022 21:19:06 +0000 Ready: True Restart Count: 2467 Requests: cpu: 100m memory: 200Mi Liveness: http-get http://:50936/haproxy_ready delay=50s timeout=1s period=10s #success=1 #failure=3 Environment: OLD_HAPROXY_PS_FORCE_DEL_TIMEOUT: 120 Mounts: /etc/haproxy from conf-dir (rw) /var/run/haproxy from run-dir (rw) haproxy-monitor: Container ID: cri-o://8dbc55cf0736abbf627cea96395029cfdb916174a5dc7480f2aa5998c928b276 Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8a85f82c666c9355d7c42781249b5f607b19f08a0f68fe4a35e70692c3e67b52 Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8a85f82c666c9355d7c42781249b5f607b19f08a0f68fe4a35e70692c3e67b52 Port: <none> Host Port: <none> Command: monitor /var/lib/kubelet/kubeconfig /config/haproxy.cfg.tmpl /etc/haproxy/haproxy.cfg --api-vip 172.40.0.5 State: Running Started: Wed, 13 Jul 2022 09:45:26 +0000 Ready: True Restart Count: 3 Requests: cpu: 100m memory: 200Mi Environment: <none> Mounts: /config from resource-dir (rw) /etc/haproxy from conf-dir (rw) /host from chroot-host (rw) /var/lib/kubelet from kubeconfigvarlib (rw) /var/run/haproxy from run-dir (rw) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: resource-dir: Type: HostPath (bare host directory volume) Path: /etc/kubernetes/static-pod-resources/haproxy HostPathType: kubeconfigvarlib: Type: HostPath (bare host directory volume) Path: /var/lib/kubelet HostPathType: run-dir: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> conf-dir: Type: HostPath (bare host directory volume) Path: /etc/haproxy HostPathType: chroot-host: Type: HostPath (bare host directory volume) Path: / HostPathType: QoS Class: Burstable Node-Selectors: <none> Tolerations: op=Exists Events: <none>
No other process is interrupting or trying to access port 50936 on master 0.
$ netstat -plunt | grep 50936 on master 0 tcp6 0 0 :::50936 :::* LISTEN 2354586/haproxy tcp6 0 0 :::50936 :::* LISTEN 2354586/haproxy tcp6 0 0 :::50936 :::* LISTEN 2354586/haproxy tcp6 0 0 :::50936 :::* LISTEN 2354586/haproxy tcp6 0 0 :::50936 :::* LISTEN 2354586/haproxy tcp6 0 0 :::50936 :::* LISTEN 2354586/haproxy tcp6 0 0 :::50936 :::* LISTEN 2354586/haproxy on master 1 tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy tcp6 0 0 :::50936 :::* LISTEN 13795/haproxy
Version-Release number of selected component (if applicable):
How reproducible:
No
Steps to Reproduce:
No
Actual results:
haproxy pod on other master nodes running properly but not on master 0.
Expected results:
Additional info:
Links consulted but not helpful:
https://access.redhat.com/solutions/6746671 << Not Working.
https://access.redhat.com/support/cases/#/case/03126522
https://bugzilla.redhat.com/show_bug.cgi?id=2012780
Also not able to run debug pod on that node, though it shows tls error , I think it is related to that.