Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-23417

New <interface-name>:vip label for keepalived VIPs is causing installations and upgrades to fail

XMLWordPrintable

    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Installing or upgrading to OCP 4.14 on platforms that allow keepalived-managed VIPs for API and Ingress can fail because of the new "<interface-name>:vip" label applied by /etc/kubernetes/static-pod-resources/keepalived/keepalived.conf.tmpl The label was added by https://issues.redhat.com/browse/OCPBUGS-4370
      
      The man page reveals that labels are restricted to 15 characters - https://man7.org/linux/man-pages/man8/ip-address.8.html
      
      There is no check or safety in place that prevents long interface names like enp33s0f0np0 from exceeding the length when the :vip suffix is added.
      
      keepalived will fail to apply the VIP and report this error:
      Netlink: error: Numerical result out of range(34), type=RTM_NEWADDR(20), seq=1700238417, pid=0

      Version-Release number of selected component (if applicable):

      4.14

      How reproducible:

      Install or upgrade to OCP 4.14 on a baremetal host that has a long interface name. Or use a custom interface name like "bridge-internal" Several customers of mine have created bridge interfaces like this in order to allow VMs to share OpenShift primary interface.

      Steps to Reproduce:

      [laptop]$ ssh core@ocp-node-1
      Red Hat Enterprise Linux CoreOS 414.92.202311061957-0
      
      [core@ocp-node-1 ~]$ ip -brief -4 a s
      lo               UNKNOWN        127.0.0.1/8  
      tun0             UNKNOWN        10.130.0.1/23 
      enp33s0f0np0     UP             10.15.168.23/24  ### API & Ingress VIPs are missing here
      
      [core@ocp-node-1 ~]$ echo -n "enp33s0f0np0:vip" | wc -c
      16
      
      [core@ocp-node-1 ~]$ sudo crictl logs $(sudo crictl ps --quiet --label io.kubernetes.container.name=keepalived) 2>&1 | grep error
      Fri Nov 17 16:39:52 2023: Netlink: error: Numerical result out of range(34), type=RTM_NEWADDR(20), seq=1700238417, pid=0
      
      [core@ocp-node-1 ~]$ grep label /etc/kubernetes/static-pod-resources/keepalived/keepalived.conf.tmpl 
              {{ .Cluster.APIVIP }}/{{ .Cluster.VIPNetmask }} label {{ .VRRPInterface }}:vip
              {{ .Cluster.IngressVIP }}/{{ .Cluster.VIPNetmask }} label {{ .VRRPInterface }}:vip
      
      [core@ocp-node-1 ~]$ grep label /etc/keepalived/keepalived.conf 
              10.15.168.68/32 label enp33s0f0np0:vip
              10.15.168.69/32 label enp33s0f0np0:vip 

      Actual results:

      keepalived fails to assign the VIP to the interface and the installation or upgrade is halted

      Expected results:

      The openshift-install, baremetal-runtimecfg, and/or machine-config-operator should check if the label will exceed 15 characters and reduce it's length if required.

      Additional info:

      https://issues.redhat.com/browse/OCPBUGS-4370

      https://github.com/ovn-org/ovn-kubernetes/pull/3552/files

      https://github.com/openshift/baremetal-runtimecfg/pull/236

      https://github.com/openshift/machine-config-operator/pull/3683

      https://github.com/openshift/ovn-kubernetes/pull/1697

       

            mkowalsk@redhat.com Mat Kowalski
            jcallrht John Call
            Anurag Saxena Anurag Saxena
            Jason Kincl
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: