Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: 4.19.0
Component/s: Networking / ovn-kubernetes
Labels:
- SDN:OVNK:BGP

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None
Epic Link:
CORENET-5654

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Rejected
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

VRF-Lite requires the cluster admin to attach an interface to an CUDN VRF. The most straightforward way to do so is using a NNCP like

apiVersion: nmstate.io/v1
kind: NodeNetworkConfigurationPolicy
metadata:
  name: udn-test-vlan
spec:
  desiredState:
    interfaces:
    - name: enp3s0
      state: up 
      controller: udn-test

where "udn-test" is an existing VRF.

However, the NNCP only applies successfully if the VRF already exists and fails without retries if it doesn't exist. There are concerns whether this is suitable for scenarios where convergence is expected, like reboots, scaling up nodes, restoring configuration backups, etc...

Looking into a bit more detail, when a CUDN and the corresponding VRF are created, it becomes managed by NM:

[connection]
id=udn-test
uuid=8adba5af-0294-4f2e-8683-241214d49d6b
type=vrf
autoconnect=false
interface-name=udn-test
timestamp=1745516597

[vrf]
table=1008

[ipv4]
method=disabled

[ipv6]
addr-gen-mode=default
method=ignore

[proxy]

[.nmmeta]
nm-generated=true
volatile=true
external=true

Then when the NNCP above is applied, the existing NM configuration for the interface gets mutated to set it as that VRF port:

[jcaamano@sdn-08 vfr-lite]$ ssh core@192.168.111.24 sudo cat /etc/NetworkManager/system-connections/enp3s0.nmconnection
[connection]
id=enp3s0
uuid=90d0354f-94c0-4189-9ef7-f932b4dbaf2e
type=ethernet
controller=udn-test
interface-name=enp3s0
port-type=vrf
timestamp=1745516266

[ethernet]

[ipv4]
dhcp-client-id=mac
dhcp-timeout=2147483647
method=auto

[ipv6]
addr-gen-mode=eui64
address1=fe80::2a3:1cff:fe61:7d60/64
dhcp-duid=ll
dhcp-iaid=mac
dhcp-timeout=2147483647
method=auto
ra-timeout=2147483647

[proxy]

All is fine up until this point.

Now as the node reboots, this happens:

VRF udn-test nor its profile exist
enp3s0 profile remains as is, configuring it as a port of udn-test VRF, however enp3s0 is actually not attached to the VRF as it doesn't exist.
eventually ovnk runs, creates the udn-test VRF, and enp3s0 is attached to it.
there is no apparent transition on the NNCP state

So even though we expected potential problems on reboot, this actually works fine.

However we can expect problems in node scale up (and other similar scenarios) since there is a chance the NNCP is applied before ovnk actually has the chance to create the VRF. In that case it NNCP will fail and remain in failed state and not actually apply the NM configuration changes needed to set the interface as port to the VRF on that node.

Other alternatives are:

Create the VRF from the NNCP as well. This requires changes in ovn-k to either use predictable table ids for the VRFs or to fully give up ownership of the VRF and expect something else to create it.

We need to understand:

If asking knmstate to retry the NNCP is the most reasonable way forward
If we should otherwise opt to make the changes in ovnk to give up ownership of the VRF in specific scenarios
If there are configuration alternatives with knmstate that can work better for us
If there is something else in knmstate that makes this work better for us than we actually expect (example: maybe knmstate waits for the node to be ready before applying configuration changes, and thus if ovnk creates VRFs as part of initial sync then we have a happens-before relationship between the two events).

relates to

RHEL-89799 Configuring an unmanaged controller on an interface fails

Planning

OCPBUGS-55353 kubernetes-nmstate does not retry to apply configuration policies

ON_QA

RHEL-89914 Devices not becoming managed sometimes

Closed

Assignee:: Jaime Caamaño Ruiz

Reporter:: Jaime Caamaño Ruiz

Need Info From:: None

Contributors:: None

QA Contact:: Ying Wang

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2025/04/24 11:12 AM

Updated:: 2025/07/13 1:33 PM

Resolved:: 2025/05/08 11:01 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates