-
Bug
-
Resolution: Done
-
Major
-
None
-
4.19.0
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
VRF-Lite requires the cluster admin to attach an interface to an CUDN VRF. The most straightforward way to do so is using a NNCP like
apiVersion: nmstate.io/v1 kind: NodeNetworkConfigurationPolicy metadata: name: udn-test-vlan spec: desiredState: interfaces: - name: enp3s0 state: up controller: udn-test
where "udn-test" is an existing VRF.
However, the NNCP only applies successfully if the VRF already exists and fails without retries if it doesn't exist. There are concerns whether this is suitable for scenarios where convergence is expected, like reboots, scaling up nodes, restoring configuration backups, etc...
Looking into a bit more detail, when a CUDN and the corresponding VRF are created, it becomes managed by NM:
[connection] id=udn-test uuid=8adba5af-0294-4f2e-8683-241214d49d6b type=vrf autoconnect=false interface-name=udn-test timestamp=1745516597 [vrf] table=1008 [ipv4] method=disabled [ipv6] addr-gen-mode=default method=ignore [proxy] [.nmmeta] nm-generated=true volatile=true external=true
Then when the NNCP above is applied, the existing NM configuration for the interface gets mutated to set it as that VRF port:
[jcaamano@sdn-08 vfr-lite]$ ssh core@192.168.111.24 sudo cat /etc/NetworkManager/system-connections/enp3s0.nmconnection [connection] id=enp3s0 uuid=90d0354f-94c0-4189-9ef7-f932b4dbaf2e type=ethernet controller=udn-test interface-name=enp3s0 port-type=vrf timestamp=1745516266 [ethernet] [ipv4] dhcp-client-id=mac dhcp-timeout=2147483647 method=auto [ipv6] addr-gen-mode=eui64 address1=fe80::2a3:1cff:fe61:7d60/64 dhcp-duid=ll dhcp-iaid=mac dhcp-timeout=2147483647 method=auto ra-timeout=2147483647 [proxy]
All is fine up until this point.
Now as the node reboots, this happens:
- VRF udn-test nor its profile exist
- enp3s0 profile remains as is, configuring it as a port of udn-test VRF, however enp3s0 is actually not attached to the VRF as it doesn't exist.
- eventually ovnk runs, creates the udn-test VRF, and enp3s0 is attached to it.
- there is no apparent transition on the NNCP state
So even though we expected potential problems on reboot, this actually works fine.
However we can expect problems in node scale up (and other similar scenarios) since there is a chance the NNCP is applied before ovnk actually has the chance to create the VRF. In that case it NNCP will fail and remain in failed state and not actually apply the NM configuration changes needed to set the interface as port to the VRF on that node.
Other alternatives are:
- Create the VRF from the NNCP as well. This requires changes in ovn-k to either use predictable table ids for the VRFs or to fully give up ownership of the VRF and expect something else to create it.
We need to understand:
- If asking knmstate to retry the NNCP is the most reasonable way forward
- If we should otherwise opt to make the changes in ovnk to give up ownership of the VRF in specific scenarios
- If there are configuration alternatives with knmstate that can work better for us
- If there is something else in knmstate that makes this work better for us than we actually expect (example: maybe knmstate waits for the node to be ready before applying configuration changes, and thus if ovnk creates VRFs as part of initial sync then we have a happens-before relationship between the two events).
- relates to
-
OCPBUGS-55353 kubernetes-nmstate does not retry to apply configuration policies
-
- New
-
-
RHEL-89799 Configuring an unmanaged controller on an interface fails
-
- Planning
-
-
RHEL-89914 Devices not becoming managed sometimes
-
- Release Pending
-