Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-22455

IPI - LACP bond with VLAN has no IP addresses after reboot

XMLWordPrintable

    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      A non br-ex LACP bond with VLAN has no IP addresses after ovs-configuration service start from last reboot when using NMstate configuration at day1.
      

      Version-Release number of selected component (if applicable):

      Any 4.14 nightly version after 4.14.0-rc.6
      

      How reproducible:

      Always
      

      Steps to Reproduce:

      1. Prepare NMstate settings to add in the install-config.yaml using dual-stack through DHCP for LACP bond0 (br-ex), bond0.vlanX (storage), bond1.vlanY (secondary bridge br-ex1)
      2. Deploy OCP 4.14 with latest nightly on a baremetal cluster with IPI and OVN-K
      3. After deployment confirm all the interfaces have dualstack (br-ex, bond0.vlanX, br-ex1)
      4. Reboot a node, after ovs-configuration service finishes, the br-ex and br-ex1 have dualstack addresses, except bond0.vlanX, which has nothing.
      

      Actual results:

      Any bond.vlanX (non br-ex or br-ex1) connection is not active
      

      Expected results:

      Any bond.vlanX (non br-ex or br-ex1) connection starts correctly after a reboot
      

      Additional info:

      - This works fine using 4.14.0-rc.6 and previous releases
      - This also works fine when setting up network with MachineConfigs manifest at day1
      

      More details:

      We have observed that the bond0.vlanX (which is for our storage network) does not get any IP address after a reboot from applying a PAO profile, but again only with releases after OCP 4.14.0-rc.6

      This is a working cluster using 4.14.0-rc.6

      [kni@provisioner.cluster1.dfwt5g.lab ~]$ oc version
      Client Version: 4.14.0-rc.6
      Kustomize Version: v5.0.1
      Server Version: 4.14.0-rc.6
      Kubernetes Version: v1.27.6+98158f9
      [kni@provisioner.cluster1.dfwt5g.lab ~]$ for x in 0 1 2 3 ; do echo "===== worker-$x =====" ; ssh core@worker-$x "ip a s bond0.300" 2>/dev/null ; done
      ===== worker-0 =====
      21: bond0.300@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
          link/ether b8:83:03:8e:1e:10 brd ff:ff:ff:ff:ff:ff
          inet 192.168.13.11/24 brd 192.168.13.255 scope global dynamic noprefixroute bond0.300
             valid_lft 5370sec preferred_lft 5370sec
          inet6 fd44:3fc2:3475:13::14/128 scope global dynamic noprefixroute
             valid_lft 5775sec preferred_lft 5775sec
          inet6 fe80::ba83:3ff:fe8e:1e10/64 scope link noprefixroute
             valid_lft forever preferred_lft forever
      ===== worker-1 =====
      21: bond0.300@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
          link/ether b8:83:03:91:c5:20 brd ff:ff:ff:ff:ff:ff
          inet 192.168.13.39/24 brd 192.168.13.255 scope global dynamic noprefixroute bond0.300
             valid_lft 6824sec preferred_lft 6824sec
          inet6 fd44:3fc2:3475:13::1d/128 scope global dynamic noprefixroute
             valid_lft 4818sec preferred_lft 4818sec
          inet6 fe80::ba83:3ff:fe91:c520/64 scope link noprefixroute
             valid_lft forever preferred_lft forever
      ===== worker-2 =====
      21: bond0.300@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
          link/ether b8:83:03:8e:0e:dc brd ff:ff:ff:ff:ff:ff
          inet 192.168.13.26/24 brd 192.168.13.255 scope global dynamic noprefixroute bond0.300
             valid_lft 6223sec preferred_lft 6223sec
          inet6 fd44:3fc2:3475:13::2c/128 scope global dynamic noprefixroute
             valid_lft 6917sec preferred_lft 6917sec
          inet6 fe80::ba83:3ff:fe8e:edc/64 scope link noprefixroute
             valid_lft forever preferred_lft forever
      ===== worker-3 =====
      21: bond0.300@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
          link/ether b8:83:03:92:c0:48 brd ff:ff:ff:ff:ff:ff
          inet 192.168.13.48/24 brd 192.168.13.255 scope global dynamic noprefixroute bond0.300
             valid_lft 4574sec preferred_lft 4574sec
          inet6 fd44:3fc2:3475:13::13/128 scope global dynamic noprefixroute
             valid_lft 5921sec preferred_lft 5921sec
          inet6 fe80::ba83:3ff:fe92:c048/64 scope link noprefixroute
             valid_lft forever preferred_lft forever
      

      This is a cluster with the failure, using latest nightly available 4.14.0-0.nightly-2023-10-25-223202

      [kni@provisioner.cluster6.dfwt5g.lab ~]$ oc version
      Client Version: 4.14.0-0.nightly-2023-10-25-223202
      Kustomize Version: v5.0.1
      Server Version: 4.14.0-0.nightly-2023-10-25-223202
      Kubernetes Version: v1.27.6+f67aeb3
      [kni@provisioner.cluster6.dfwt5g.lab ~]$ for x in 0 1 2 3 ; do echo "===== worker-$x =====" ; ssh core@worker-$x "ip a s bond0.360" 2>/dev/null ; done
      ===== worker-0 =====
      21: bond0.360@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
          link/ether b8:83:03:91:c5:2c brd ff:ff:ff:ff:ff:ff
      ===== worker-1 =====
      21: bond0.360@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
          link/ether b8:83:03:91:c5:e8 brd ff:ff:ff:ff:ff:ff
      ===== worker-2 =====
      21: bond0.360@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
          link/ether b8:83:03:91:c5:a4 brd ff:ff:ff:ff:ff:ff
      ===== worker-3 =====
      21: bond0.360@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
          link/ether b8:83:03:91:c5:30 brd ff:ff:ff:ff:ff:ff
      
      

      Also the interesting part is that there are other connections, bond0 for br-ex, and bond0.vlanY for br-ex1 and they work fine, they have the same configuration as bond0.vlanX and bond0.vlanX is not active

      For example, this is from the cluster with the failure, and bond0.360

      [core@worker-0 ~]$ sudo diff /etc/NetworkManager/system-connections/bond0.360.nmconnection /etc/NetworkManager/system-connections/bond0.662.nmconnection 
      4,5c4,5
      < id=bond0.360
      < interface-name=bond0.360
      ---
      > id=bond0.662
      > interface-name=bond0.662
      7c7
      < uuid=67bd5745-a6c1-5c61-b27c-2a6958b3a777
      ---
      > uuid=40469483-a8e9-5fd2-9f0d-fe04a3d52e3a
      36c36
      < id=360
      ---
      > id=662
      
      [core@worker-0 ~]$ nmcli con show --active | grep bond0.360
      [core@worker-0 ~]$
      

      Finally, in the ovs-configuration service from the last reboot, we can see that when the service start it shows the bond0.360 has both IPv4 and IPV6, but when it finishes it shows no IP addresses at all.

      Oct 26 18:11:38 worker-0 configure-ovs.sh[7116]: 21: bond0.360@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
      Oct 26 18:11:38 worker-0 configure-ovs.sh[7116]:     link/ether b8:83:03:91:c5:2c brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 0 maxmtu 65535
      Oct 26 18:11:38 worker-0 configure-ovs.sh[7116]:     vlan protocol 802.1Q id 360 <REORDER_HDR> numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 524280
       tso_max_segs 65535 gro_max_size 65536
      Oct 26 18:11:38 worker-0 configure-ovs.sh[7116]:     inet 192.168.63.30/24 brd 192.168.63.255 scope global dynamic noprefixroute bond0.360
      Oct 26 18:11:38 worker-0 configure-ovs.sh[7116]:        valid_lft 7185sec preferred_lft 7185sec
      Oct 26 18:11:38 worker-0 configure-ovs.sh[7116]:     inet6 fd44:3fc2:3475:63::2b/128 scope global dynamic noprefixroute
      Oct 26 18:11:38 worker-0 configure-ovs.sh[7116]:        valid_lft 7180sec preferred_lft 7180sec
      Oct 26 18:11:38 worker-0 configure-ovs.sh[7116]:     inet6 fe80::ba83:3ff:fe91:c52c/64 scope link noprefixroute
      Oct 26 18:11:38 worker-0 configure-ovs.sh[7116]:        valid_lft forever preferred_lft forever
      ...
      Oct 26 18:19:26 worker-0 configure-ovs.sh[12640]: 21: bond0.360@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
      Oct 26 18:19:26 worker-0 configure-ovs.sh[12640]:     link/ether b8:83:03:91:c5:2c brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 0 maxmtu 65535
      Oct 26 18:19:26 worker-0 configure-ovs.sh[12640]:     vlan protocol 802.1Q id 360 <REORDER_HDR> numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 52428
      0 tso_max_segs 65535 gro_max_size 65536
      

            bnemec@redhat.com Benjamin Nemec
            rhn-gps-manrodri Manuel Rodriguez
            Zhanqi Zhao Zhanqi Zhao
            Tim Rozet
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: