-
Bug
-
Resolution: Done-Errata
-
Major
-
4.13
Description of problem:
Customer reported that keepalived pods crashes and fail to start on worker node (Ingress VIP). The expectation is that the keepalived pod (labeled by app=kni-infra-vrrp) should start. This affects everyone using OCP v4.13 together with Ingress VIP and could be a potential bug in the nodeip-configuration service in v4.13.
More details as below:
-> There are 2 problems in OCP v4.13. The regexp expression won't match and the chroot command will fail because of missing ldd libraries inside the container. This has been fixed on 4.14, but not on 4.13.
-> The nodeip-configuration service creates the /run/nodeip-configuration/remote-worker file based on onPremPlatformAPIServerInternalIPs (apiVIP) and ignores the onPremPlatformIngressIPs (ingressVIP) as can be seen in source code.
-> Then the keepalived process wont start because the remote-worker file exists.
-> The liveness probes will fail because the keepalived process does not exist.
The fix is quite simple(as highlighted by the customer), The nodeip-configuration.service template needs to be to extended to consider the Ingress VIPs as well. This is the source code where changes need to be done
As per the following code snippet, The NODE-IP ranges only over the onPremPlatformAPIServerInternalIPs and ignores the onPremPlatformIngressIPs.
node-ip \ set \ --platform {{ .Infra.Status.PlatformStatus.Type }} \ {{if not (isOpenShiftManagedDefaultLB .) -}} --user-managed-lb \ {{end -}} {{if or (eq .IPFamilies "IPv6") (eq .IPFamilies "DualStackIPv6Primary") -}} --prefer-ipv6 \ {{end -}} --retry-on-failure \ {{ range onPremPlatformAPIServerInternalIPs . }}{{.}} {{end}}; \ do \ sleep 5; \ done"
Difference between OCPv 4.12 and v4.13 related to keepalived pod is also indicated in this image attached
Version-Release number of selected component (if applicable):
v4.13
How reproducible:
Steps to Reproduce:
1. 2. 3.
Actual results:
The keepalived pods crashes and fail to start on worker node (Ingress VIP)
Expected results:
The expectation is that the keepalived pod (labeled by app=kni-infra-vrrp) should start.
Additional info:
- blocks
-
OCPBUGS-20025 Keepalived pods crashes and fail to start on worker node (Ingress VIP)
- Closed
- is cloned by
-
OCPBUGS-20025 Keepalived pods crashes and fail to start on worker node (Ingress VIP)
- Closed
- links to
-
RHEA-2023:7198 rpm