-
Bug
-
Resolution: Done-Errata
-
Critical
-
None
-
4.13.z, 4.14.z, 4.15.z, 4.16.0
`nmstatectl persist-nic-names` is a functionality that creates `.link` files under `/etc/systemd/network/` that save NIC name based on its MAC address. For example, it allows to create a following file
[Match] MACAddress=00:a0:de:63:7a:e6 [Link] Name=dmz0
It was however discovered in ARO with Accelerated Networking that for systems where multiple NICs hold the same MAC address [4], this causes issues as it tries to match one name for multiple interfaces (what is obviously wrong as we can't have the same name over multiple NICs).
Example of how such a system looks
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP mode DEFAULT group default qlen 1000 link/ether 00:0d:3a:d4:1b:5e brd ff:ff:ff:ff:ff:ff 3: enP41651s1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master eth0 state UP mode DEFAULT group default qlen 1000 link/ether 00:0d:3a:d4:1b:5e brd ff:ff:ff:ff:ff:ff altname enP41651p0s2
E: ID_NET_DRIVER=hv_netvsc
E: ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
E: ID_NET_NAME=eth0
E: ID_NET_DRIVER=mlx5_core
E: ID_NET_LINK_FILE=/usr/lib/systemd/network/99-default.link
E: ID_NET_NAME=enP41651s1
If a .link file is created trying to match this MAC address, the system goes into unhealthy state (details in slack threads [1][2]) because one interface gets renamed correctly but the other one stays in ID_RENAMING=1 state forever. Something like this is visible in the journal of the affected system
Feb 22 11:26:55 localhost kernel: mlx5_core a2b3:00:02.0 enP41651s1: renamed from eth1 Feb 22 11:27:00 weliang-aro22-9zhch-master-0 systemd-udevd[907]: eth0: Failed to rename network interface 3 from 'enP41651s1' to 'eth0': File exists Feb 22 11:28:04 weliang-aro22-9zhch-master-0 bash[1556]: OVS SDN mode - br-ex not found, using device enP41651s1 Feb 22 11:28:06 weliang-aro22-9zhch-master-0 kubenswrapper[1671]: E0222 11:28:06.670156 1671 dns.go:300] "Could not parse resolv conf file." err="Encountered error while parsing resolv conf file. Error: nameserver list is empty "
The issue is a combination of the fact that hv_netvsc driver on its own is preventing RHEL9 from the use of Consistent Naming [3] and the fact that we have one NIC using hv_netvsc and the other mlx5_core (for which the naming convention is enP*).
The issue in OCP has been introduced via https://github.com/openshift/machine-config-operator/pull/4020 and affects upgrades from 4.12 to 4.13 (so RHEL8 to RHEL9, when the NIC naming schema changed).
A workaround to unblock the affected cluster is to manually remove the .link file from /etc/systemd/network/
After discussions the long-term solution here is to have .link file match on both MAC and Driver so that there is only one NIC that can match one .link file, e.g.
[Match] MACAddress=00:a0:de:63:7a:e6 Driver=hv_netvsc
This requires changes in nispor (which collects the data about network interfaces from the kernel) and in nmstate (which is responsible for generating the .link file).
[1] https://redhat-internal.slack.com/archives/CCV9YF9PD/p1708601734603019?thread_ts=1707489506.104789&cid=CCV9YF9PD
[2] https://redhat-internal.slack.com/archives/C04MH2B47HB/p1708608946319009
[3] https://access.redhat.com/solutions/3204751
[4] https://learn.microsoft.com/en-us/azure/virtual-network/accelerated-networking-how-it-works#bonding
- clones
-
OCPBUGS-30256 nmstatectl "persist-nic-names" does not save driver info
- Closed
- is blocked by
-
OPNET-479 Impact: OCPBUGS-30256: nmstatectl "persist-nic-names" does not save driver info
- Closed
- is cloned by
-
OCPBUGS-31752 nmstatectl "persist-nic-names" does not save driver info
- Closed
- links to
-
RHSA-2024:1770 OpenShift Container Platform 4.15.z security update