-
Bug
-
Resolution: Won't Do
-
Undefined
-
None
-
4.10.z
-
No
-
False
-
Description of problem:
On Azure Accelerated Networking VMs there is an additional network interface that corresponds to the Mellanox SR-IOV virtual function. This is a slave interface and should not be managed. To accomplish that, we ship a udev rule /usr/lib/udev/rules.d/68-azure-sriov-nm-unmanaged.rules. For rhcos 8.x nodes (ocp 4.10.x) the rule is not effective, NetworkManager tries to constantly bring up DHCP on the VF interface. For rhcos 9.x nodes (ocp 4.13.x) we do see that the rule is effective. The udev rule is the same between the two versions. We are seeing this on multiple clusters in ARO, but presumably other non-ARO Azure clusters have the same issue.
Version-Release number of selected component (if applicable):
ARO OCP 4.10.63
How reproducible:
Always
Steps to Reproduce:
On an ARO cluster, configure a machineset with `acceleratedNetworking: true`. Then get a node debug shell, run nmcli, and observe that the enP* interface is not set to unmanaged. Udev rule evaluation is provided for both versions for comparison.
Actual results:
### this is 4.10.63 sh-4.4# nmcli enP64657s1: connecting (getting IP configuration) to Wired Connection "Mellanox MT27500/MT27520" ethernet (mlx4_core), 00:0D:3A:1C:B6:7E, hw, mtu 1500 sh-4.4# udevadm info /sys/class/net/enP64657s1 P: /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/f08e4542-fc91-4540-b468-241618eeb6f1/pcifc91:00/fc91:00:02.0/net/enP64657s1 E: DEVPATH=/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/f08e4542-fc91-4540-b468-241618eeb6f1/pcifc91:00/fc91:00:02.0/net/enP64657s1 E: ID_BUS=pci E: ID_MODEL_FROM_DATABASE=MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] E: ID_MODEL_ID=0x1004 E: ID_NET_DRIVER=mlx4_en E: ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link E: ID_NET_NAME=enP64657s1 E: ID_NET_NAME_MAC=enx000d3a1cb67e E: ID_NET_NAME_PATH=enP64657p0s2 E: ID_NET_NAME_SLOT=enP64657s1 E: ID_NET_NAMING_SCHEME=rhel-8.0 E: ID_OUI_FROM_DATABASE=Microsoft Corp. E: ID_PATH=acpi-VMBUS:01-pci-fc91:00:02.0 E: ID_PATH_TAG=acpi-VMBUS_01-pci-fc91_00_02_0 E: ID_PCI_CLASS_FROM_DATABASE=Network controller E: ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller E: ID_VENDOR_FROM_DATABASE=Mellanox Technologies E: ID_VENDOR_ID=0x15b3 E: IFINDEX=3 E: INTERFACE=enP64657s1 E: NM_UNMANAGED=1 E: SUBSYSTEM=net E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/enP64657s1 E: TAGS=:systemd: E: USEC_INITIALIZED=106604677 sh-4.4# cat /host/usr/lib/udev/rules.d/68-azure-sriov-nm-unmanaged.rules # Accelerated Networking on Azure exposes a new SRIOV interface to the VM. # This interface is transparently bonded to the synthetic interface, # so NetworkManager should just ignore any SRIOV interfaces. SUBSYSTEM=="net", DRIVERS=="hv_pci", ACTION=="add|change|move", ENV{NM_UNMANAGED}="1"
Expected results:
enP15620s1: unmanaged "Mellanox MT27500/MT27520" ethernet (mlx4_core), 00:0D:3A:9B:C8:4F, hw, mtu 1500 sh-5.1# udevadm info /sys/class/net/enP15620s1 P: /devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/eedca831-3d04-4e81-ab11-54d44b14a726/pci3d04:00/3d04:00:02.0/net/enP15620s1 M: enP15620s1 R: 1 U: net I: 3 E: DEVPATH=/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A03:00/device:07/VMBUS:01/eedca831-3d04-4e81-ab11-54d44b14a726/pci3d04:00/3d04:00:02.0/net/enP15620s1 E: SUBSYSTEM=net E: INTERFACE=enP15620s1 E: IFINDEX=3 E: USEC_INITIALIZED=9986811 E: NM_UNMANAGED=1 E: ID_NET_NAMING_SCHEME=rhel-9.0 E: ID_NET_NAME_MAC=enx000d3a9bc84f E: ID_OUI_FROM_DATABASE=Microsoft Corp. E: ID_NET_NAME_PATH=enP15620p0s2 E: ID_NET_NAME_SLOT=enP15620s1 E: ID_BUS=pci E: ID_VENDOR_ID=0x15b3 E: ID_MODEL_ID=0x1004 E: ID_PCI_CLASS_FROM_DATABASE=Network controller E: ID_PCI_SUBCLASS_FROM_DATABASE=Ethernet controller E: ID_VENDOR_FROM_DATABASE=Mellanox Technologies E: ID_MODEL_FROM_DATABASE=MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function] E: ID_PATH=acpi-VMBUS:01-pci-3d04:00:02.0 E: ID_PATH_TAG=acpi-VMBUS_01-pci-3d04_00_02_0 E: ID_NET_DRIVER=mlx4_en E: ID_NET_LINK_FILE=/etc/systemd/network/98-nmstate-enP15620s1.link E: ID_NET_NAME=enP15620s1 E: SYSTEMD_ALIAS=/sys/subsystem/net/devices/enP15620s1 E: TAGS=:systemd: E: CURRENT_TAGS=:systemd: sh-5.1# cat /usr/lib/udev/rules.d/68-azure-sriov-nm-unmanaged.rules # Accelerated Networking on Azure exposes a new SRIOV interface to the VM. # This interface is transparently bonded to the synthetic interface, # so NetworkManager should just ignore any SRIOV interfaces. SUBSYSTEM=="net", DRIVERS=="hv_pci", ACTION=="add|change|move", ENV{NM_UNMANAGED}="1"
Additional info:
The udev rule is setting NM_UNMANAGED=1 as intended. NetworkManager isn't honoring that flag. Logs show NM repeatedly trying to DHCP this interface: sh-4.4# journalctl -b NM_DEVICE=enP64657s1 ... Sep 12 22:28:49 aro-adenton-6l5bx-worker-eastus1-cnnv5 NetworkManager[1440]: <warn> [1694557729.8525] dhcp4 (enP64657s1): request timed out Sep 12 22:28:49 aro-adenton-6l5bx-worker-eastus1-cnnv5 NetworkManager[1440]: <info> [1694557729.8526] dhcp4 (enP64657s1): state changed unknown -> timeout Sep 12 22:28:49 aro-adenton-6l5bx-worker-eastus1-cnnv5 NetworkManager[1440]: <info> [1694557729.8527] device (enP64657s1): state change: ip-config -> failed (reason 'ip-config-unavai> Sep 12 22:28:49 aro-adenton-6l5bx-worker-eastus1-cnnv5 NetworkManager[1440]: <warn> [1694557729.8542] device (enP64657s1): Activation: failed for connection 'Wired Connection' Sep 12 22:28:49 aro-adenton-6l5bx-worker-eastus1-cnnv5 NetworkManager[1440]: <info> [1694557729.8545] device (enP64657s1): state change: failed -> disconnected (reason 'none', sys-if> Sep 12 22:28:49 aro-adenton-6l5bx-worker-eastus1-cnnv5 NetworkManager[1440]: <info> [1694557729.8827] dhcp4 (enP64657s1): canceled DHCP transaction Sep 12 22:28:49 aro-adenton-6l5bx-worker-eastus1-cnnv5 NetworkManager[1440]: <info> [1694557729.8828] dhcp4 (enP64657s1): state changed timeout -> done Sep 12 22:33:49 aro-adenton-6l5bx-worker-eastus1-cnnv5 NetworkManager[1440]: <info> [1694558029.8528] device (enP64657s1): Activation: starting connection 'Wired Connection' (1667573> Sep 12 22:33:49 aro-adenton-6l5bx-worker-eastus1-cnnv5 NetworkManager[1440]: <info> [1694558029.8530] device (enP64657s1): state change: disconnected -> prepare (reason 'none', sys-i> Sep 12 22:33:49 aro-adenton-6l5bx-worker-eastus1-cnnv5 NetworkManager[1440]: <info> [1694558029.8534] device (enP64657s1): state change: prepare -> config (reason 'none', sys-iface-s> Sep 12 22:33:49 aro-adenton-6l5bx-worker-eastus1-cnnv5 NetworkManager[1440]: <info> [1694558029.8542] device (enP64657s1): state change: config -> ip-config (reason 'none', sys-iface> Sep 12 22:33:49 aro-adenton-6l5bx-worker-eastus1-cnnv5 NetworkManager[1440]: <info> [1694558029.8547] dhcp4 (enP64657s1): activation: beginning transaction (timeout in 90 seconds) Sep 12 22:33:49 aro-adenton-6l5bx-worker-eastus1-cnnv5 NetworkManager[1440]: <warn> [1694558029.8611] device (enP64657s1): linklocal6: DAD failed for an EUI-64 address Sep 12 22:35:19 aro-adenton-6l5bx-worker-eastus1-cnnv5 NetworkManager[1440]: <warn> [1694558119.8726] dhcp4 (enP64657s1): request timed out Sep 12 22:35:19 aro-adenton-6l5bx-worker-eastus1-cnnv5 NetworkManager[1440]: <info> [1694558119.8727] dhcp4 (enP64657s1): state changed unknown -> timeout Sep 12 22:35:19 aro-adenton-6l5bx-worker-eastus1-cnnv5 NetworkManager[1440]: <info> [1694558119.8728] device (enP64657s1): state change: ip-config -> failed (reason 'ip-config-unavai> Sep 12 22:35:19 aro-adenton-6l5bx-worker-eastus1-cnnv5 NetworkManager[1440]: <warn> [1694558119.8744] device (enP64657s1): Activation: failed for connection 'Wired Connection' Sep 12 22:35:19 aro-adenton-6l5bx-worker-eastus1-cnnv5 NetworkManager[1440]: <info> [1694558119.8751] device (enP64657s1): state change: failed -> disconnected (reason 'none', sys-if> Sep 12 22:35:19 aro-adenton-6l5bx-worker-eastus1-cnnv5 NetworkManager[1440]: <info> [1694558119.9046] dhcp4 (enP64657s1): canceled DHCP transaction Sep 12 22:35:19 aro-adenton-6l5bx-worker-eastus1-cnnv5 NetworkManager[1440]: <info> [1694558119.9046] dhcp4 (enP64657s1): state changed timeout -> done