-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.13.z
-
None
-
False
-
The ovs-configuration.service service fails every time whenever the node reboots with below errors in NetworkManager logs.
Aug 24 15:08:33 example-node NetworkManager[1255]: <info> [1724512113.7749] dhcp4 (br-ex): activation: beginning transaction (timeout in 45 seconds) Aug 24 15:08:33 example-node NetworkManager[1255]: <info> [1724512113.7953] dhcp4 (br-ex): state changed no lease Aug 24 15:08:33 example-node NetworkManager[1255]: <info> [1724512113.8228] dhcp4 (br-ex): state changed no lease Aug 24 15:08:35 example-node NetworkManager[1255]: <info> [1724512115.8415] dhcp4 (br-ex): state changed no lease Aug 24 15:08:46 example-node NetworkManager[1255]: <info> [1724512126.4140] dhcp4 (br-ex): state changed no lease Aug 24 15:08:54 example-node NetworkManager[1255]: <info> [1724512134.4321] dhcp4 (br-ex): state changed no lease Aug 24 15:09:03 example-node NetworkManager[1255]: <warn> [1724512143.7665] dispatcher: (54) /etc/NetworkManager/dispatcher.d/30-resolv-prepender failed (failed): Script '/etc/NetworkManager/dispatcher.d/30-resolv-prepender' exited with status 1. Aug 24 15:09:10 example-node NetworkManager[1255]: <info> [1724512150.4514] dhcp4 (br-ex): state changed no lease Aug 24 15:09:18 example-node NetworkManager[1255]: <info> [1724512158.9688] device (br-ex): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed') Aug 24 15:09:18 example-node NetworkManager[1255]: <info> [1724512158.9692] manager: NetworkManager state is now CONNECTED_LOCAL Aug 24 15:09:18 example-node NetworkManager[1255]: <info> [1724512158.9695] device (br-ex): detaching ovs interface br-ex Aug 24 15:09:18 example-node NetworkManager[1255]: <info> [1724512158.9768] dhcp4 (br-ex): canceled DHCP transaction Aug 24 15:09:18 example-node NetworkManager[1255]: <info> [1724512158.9768] dhcp4 (br-ex): activation: beginning transaction (timeout in 45 seconds) Aug 24 15:09:18 example-node NetworkManager[1255]: <info> [1724512158.9768] dhcp4 (br-ex): state changed no lease Aug 24 15:09:18 example-node NetworkManager[1255]: <info> [1724512158.9770] device (br-ex): released from master device br-ex Aug 24 15:09:18 example-node NetworkManager[1255]: <warn> [1724512158.9772] device (br-ex): Activation: failed for connection 'ovs-if-br-ex''
This indicates that br-ex isn't able to fetch the IP address from DHCP and looks like a DHCP issue but that's wrong since the primary NIC is always able to get the IP address from DHCP and only br-ex fails whenever the
ovs-configuration.service starts.
The ovs-configuration.service logs are having below errors.
Aug 24 15:17:04 example-node configure-ovs.sh[663779]: 6: br-int: <BROADCAST,MULTICAST> mtu 1400 qdisc noop state DOWN group default qlen 1000 Aug 24 15:17:04 example-node configure-ovs.sh[663779]: link/ether x:x:x:x:x:x brd ff:ff:ff:ff:ff:ff promiscuity 1 minmtu 68 maxmtu 65535 Aug 24 15:17:04 example-node configure-ovs.sh[663779]: openvswitch numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 tso_max_size 65536 tso_max_segs 65535 gro_max_size 65536 Aug 24 15:17:04 example-node systemd[1]: ovs-configuration.service: Main process exited, code=exited, status=4/NOPERMISSION Aug 24 15:17:04 example-node configure-ovs.sh[627756]: + ip route show Aug 24 15:17:04 example-node systemd[1]: ovs-configuration.service: Failed with result 'exit-code'. Aug 24 15:17:04 example-node configure-ovs.sh[663780]: default via 10.x.x.x dev ens192 proto dhcp src 10.x.x.x metric 100 Aug 24 15:17:04 example-node configure-ovs.sh[663780]: 10.x.0.0/14 via 10.129.0.1 dev ovn-k8s-mp0 Aug 24 15:17:04 example-node configure-ovs.sh[663780]: 10.x.0.0/23 dev ovn-k8s-mp0 proto kernel scope link src 10.x.0.x Aug 24 15:17:04 example-node configure-ovs.sh[663780]: 10.x.x.64/26 dev ens192 proto kernel scope link src 10.x.x.x metric 100 Aug 24 15:17:04 example-node configure-ovs.sh[663780]: 169.x.x.3 via 10.129.0.1 dev ovn-k8s-mp0 Aug 24 15:17:04 example-node systemd[1]: Failed to start Configures OVS with proper host networking configuration. Aug 24 15:17:04 example-node configure-ovs.sh[627756]: + ip -6 route show Aug 24 15:17:04 example-node systemd[1]: ovs-configuration.service: Consumed 1.302s CPU time. Aug 24 15:17:04 example-node configure-ovs.sh[663781]: ::1 dev lo proto kernel metric 256 pref medium Aug 24 15:17:04 example-node configure-ovs.sh[663781]: fe80::/64 dev genev_sys_6081 proto kernel metric 256 pref medium Aug 24 15:17:04 example-node configure-ovs.sh[663781]: fe80::/64 dev ens192 proto kernel metric 1024 pref medium Aug 24 15:17:04 example-node configure-ovs.sh[627756]: + exit 4
Node reboot, NetworkManager and ovs-configuration.service restart didn't help at all.
This issue is exactly similar to the below bug for a very older version.
--> https://bugzilla.redhat.com/show_bug.cgi?id=2048352
The issue was resolved by restarting the openvswitch service first and then ovs-configuration.service.
--> https://bugzilla.redhat.com/show_bug.cgi?id=2048352#c9
--> $ sudo systemctl restart openvswitch
--> $ sudo systemctl restart ovs-configuration.service
Still, this is a temporary workaround because whenever the node reboots the same issue comes up and a workaround needs to be applied. Even both the services needs to be restarted multiple times in some cases to bring up the br-ex.
I will provide the one of the node sosreport.