-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.14
-
No
-
False
-
Description of problem:
A non br-ex LACP bond with VLAN has no IP addresses after ovs-configuration service start from last reboot when using NMstate configuration at day1.
Version-Release number of selected component (if applicable):
Any 4.14 nightly version after 4.14.0-rc.6
How reproducible:
Always
Steps to Reproduce:
1. Prepare NMstate settings to add in the install-config.yaml using dual-stack through DHCP for LACP bond0 (br-ex), bond0.vlanX (storage), bond1.vlanY (secondary bridge br-ex1) 2. Deploy OCP 4.14 with latest nightly on a baremetal cluster with IPI and OVN-K 3. After deployment confirm all the interfaces have dualstack (br-ex, bond0.vlanX, br-ex1) 4. Reboot a node, after ovs-configuration service finishes, the br-ex and br-ex1 have dualstack addresses, except bond0.vlanX, which has nothing.
Actual results:
Any bond.vlanX (non br-ex or br-ex1) connection is not active
Expected results:
Any bond.vlanX (non br-ex or br-ex1) connection starts correctly after a reboot
Additional info:
- This works fine using 4.14.0-rc.6 and previous releases - This also works fine when setting up network with MachineConfigs manifest at day1
More details:
We have observed that the bond0.vlanX (which is for our storage network) does not get any IP address after a reboot from applying a PAO profile, but again only with releases after OCP 4.14.0-rc.6
This is a working cluster using 4.14.0-rc.6
[kni@provisioner.cluster1.dfwt5g.lab ~]$ oc version Client Version: 4.14.0-rc.6 Kustomize Version: v5.0.1 Server Version: 4.14.0-rc.6 Kubernetes Version: v1.27.6+98158f9 [kni@provisioner.cluster1.dfwt5g.lab ~]$ for x in 0 1 2 3 ; do echo "===== worker-$x =====" ; ssh core@worker-$x "ip a s bond0.300" 2>/dev/null ; done ===== worker-0 ===== 21: bond0.300@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000 link/ether b8:83:03:8e:1e:10 brd ff:ff:ff:ff:ff:ff inet 192.168.13.11/24 brd 192.168.13.255 scope global dynamic noprefixroute bond0.300 valid_lft 5370sec preferred_lft 5370sec inet6 fd44:3fc2:3475:13::14/128 scope global dynamic noprefixroute valid_lft 5775sec preferred_lft 5775sec inet6 fe80::ba83:3ff:fe8e:1e10/64 scope link noprefixroute valid_lft forever preferred_lft forever ===== worker-1 ===== 21: bond0.300@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000 link/ether b8:83:03:91:c5:20 brd ff:ff:ff:ff:ff:ff inet 192.168.13.39/24 brd 192.168.13.255 scope global dynamic noprefixroute bond0.300 valid_lft 6824sec preferred_lft 6824sec inet6 fd44:3fc2:3475:13::1d/128 scope global dynamic noprefixroute valid_lft 4818sec preferred_lft 4818sec inet6 fe80::ba83:3ff:fe91:c520/64 scope link noprefixroute valid_lft forever preferred_lft forever ===== worker-2 ===== 21: bond0.300@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000 link/ether b8:83:03:8e:0e:dc brd ff:ff:ff:ff:ff:ff inet 192.168.13.26/24 brd 192.168.13.255 scope global dynamic noprefixroute bond0.300 valid_lft 6223sec preferred_lft 6223sec inet6 fd44:3fc2:3475:13::2c/128 scope global dynamic noprefixroute valid_lft 6917sec preferred_lft 6917sec inet6 fe80::ba83:3ff:fe8e:edc/64 scope link noprefixroute valid_lft forever preferred_lft forever ===== worker-3 ===== 21: bond0.300@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000 link/ether b8:83:03:92:c0:48 brd ff:ff:ff:ff:ff:ff inet 192.168.13.48/24 brd 192.168.13.255 scope global dynamic noprefixroute bond0.300 valid_lft 4574sec preferred_lft 4574sec inet6 fd44:3fc2:3475:13::13/128 scope global dynamic noprefixroute valid_lft 5921sec preferred_lft 5921sec inet6 fe80::ba83:3ff:fe92:c048/64 scope link noprefixroute valid_lft forever preferred_lft forever
This is a cluster with the failure, using latest nightly available 4.14.0-0.nightly-2023-10-25-223202
[kni@provisioner.cluster6.dfwt5g.lab ~]$ oc version Client Version: 4.14.0-0.nightly-2023-10-25-223202 Kustomize Version: v5.0.1 Server Version: 4.14.0-0.nightly-2023-10-25-223202 Kubernetes Version: v1.27.6+f67aeb3 [kni@provisioner.cluster6.dfwt5g.lab ~]$ for x in 0 1 2 3 ; do echo "===== worker-$x =====" ; ssh core@worker-$x "ip a s bond0.360" 2>/dev/null ; done ===== worker-0 ===== 21: bond0.360@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether b8:83:03:91:c5:2c brd ff:ff:ff:ff:ff:ff ===== worker-1 ===== 21: bond0.360@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether b8:83:03:91:c5:e8 brd ff:ff:ff:ff:ff:ff ===== worker-2 ===== 21: bond0.360@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether b8:83:03:91:c5:a4 brd ff:ff:ff:ff:ff:ff ===== worker-3 ===== 21: bond0.360@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether b8:83:03:91:c5:30 brd ff:ff:ff:ff:ff:ff
Also the interesting part is that there are other connections, bond0 for br-ex, and bond0.vlanY for br-ex1 and they work fine, they have the same configuration as bond0.vlanX and bond0.vlanX is not active
For example, this is from the cluster with the failure, and bond0.360
[core@worker-0 ~]$ sudo diff /etc/NetworkManager/system-connections/bond0.360.nmconnection /etc/NetworkManager/system-connections/bond0.662.nmconnection 4,5c4,5 < id=bond0.360 < interface-name=bond0.360 --- > id=bond0.662 > interface-name=bond0.662 7c7 < uuid=67bd5745-a6c1-5c61-b27c-2a6958b3a777 --- > uuid=40469483-a8e9-5fd2-9f0d-fe04a3d52e3a 36c36 < id=360 --- > id=662 [core@worker-0 ~]$ nmcli con show --active | grep bond0.360 [core@worker-0 ~]$
Finally, in the ovs-configuration service from the last reboot, we can see that when the service start it shows the bond0.360 has both IPv4 and IPV6, but when it finishes it shows no IP addresses at all.
Oct 26 18:11:38 worker-0 configure-ovs.sh[7116]: 21: bond0.360@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000 Oct 26 18:11:38 worker-0 configure-ovs.sh[7116]: link/ether b8:83:03:91:c5:2c brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 0 maxmtu 65535 Oct 26 18:11:38 worker-0 configure-ovs.sh[7116]: vlan protocol 802.1Q id 360 <REORDER_HDR> numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 524280 tso_max_segs 65535 gro_max_size 65536 Oct 26 18:11:38 worker-0 configure-ovs.sh[7116]: inet 192.168.63.30/24 brd 192.168.63.255 scope global dynamic noprefixroute bond0.360 Oct 26 18:11:38 worker-0 configure-ovs.sh[7116]: valid_lft 7185sec preferred_lft 7185sec Oct 26 18:11:38 worker-0 configure-ovs.sh[7116]: inet6 fd44:3fc2:3475:63::2b/128 scope global dynamic noprefixroute Oct 26 18:11:38 worker-0 configure-ovs.sh[7116]: valid_lft 7180sec preferred_lft 7180sec Oct 26 18:11:38 worker-0 configure-ovs.sh[7116]: inet6 fe80::ba83:3ff:fe91:c52c/64 scope link noprefixroute Oct 26 18:11:38 worker-0 configure-ovs.sh[7116]: valid_lft forever preferred_lft forever ... Oct 26 18:19:26 worker-0 configure-ovs.sh[12640]: 21: bond0.360@bond0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 Oct 26 18:19:26 worker-0 configure-ovs.sh[12640]: link/ether b8:83:03:91:c5:2c brd ff:ff:ff:ff:ff:ff promiscuity 0 minmtu 0 maxmtu 65535 Oct 26 18:19:26 worker-0 configure-ovs.sh[12640]: vlan protocol 802.1Q id 360 <REORDER_HDR> numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 52428 0 tso_max_segs 65535 gro_max_size 65536