Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-50556

The nmstate device-reapply process with localnet ovn interfaces detached physical interfaces from bond

    • nmstate-2.2.35-1.el9
    • No
    • Moderate
    • ZStream
    • 1
    • rhel-sst-network-management
    • ssg_networking
    • 3
    • Hide
      Customer/Partner Jira ID Case ID Status Details
      EMIRATES NBD BANK RHEL-50556 03877931 Nmstate device-reapply process in an OCP cluster is leading to physical interfaces being detached from their network bond (bond1) during configuration updates. This bug occurs under specific network setups involving RHCOS nodes on RHEL 9.2. The current workaround, which allows extra patch ports in the bridge configuration, has stabilized the issue temporarily. Next steps include refining and  further analysis of provided sosreports and the must-gather to understand the root cause and develop a permanent fix.
       
      [2024-08-13] Patches are available upstream and the support team is currently checking the fix in customer environment. 
      Show
      Customer/Partner Jira ID Case ID Status Details EMIRATES NBD BANK RHEL-50556 03877931 Nmstate device-reapply process in an OCP cluster is leading to physical interfaces being detached from their network bond (bond1) during configuration updates. This bug occurs under specific network setups involving RHCOS nodes on RHEL 9.2. The current workaround, which allows extra patch ports in the bridge configuration, has stabilized the issue temporarily. Next steps include refining and  further analysis of provided sosreports and the must-gather to understand the root cause and develop a permanent fix.   [2024-08-13] Patches are available upstream and the support team is currently checking the fix in customer environment. 
    • False
    • Hide

      None

      Show
      None
    • None
    • RHEL-9.5 Doc week/Last fixes
    • Approved Blocker
    • Hide

      Given a system administrator has configured an OCP cluster with a network bond (bond1) consisting of two physical interfaces included in the configuration of an OVS bridge, 

      When, they update the network configuration with nmstate device-reapply, 

      Then, the physical interfaces in bond1 remain consistently attached to the bond without detaching.

      Definition of Done:

      • The implementation meets the acceptance criteria
      • Integration tests are written and pass 
      • The  code is part of a downstream build attached to an errata.
      Show
      Given a system administrator has configured an OCP cluster with a network bond (bond1) consisting of two physical interfaces included in the configuration of an OVS bridge,  When, they update the network configuration with nmstate device-reapply,  Then, the physical interfaces in bond1 remain consistently attached to the bond without detaching. Definition of Done: The implementation meets the acceptance criteria Integration tests are written and pass  The  code is part of a downstream build attached to an errata.
    • Pass
    • None
    • None

      What were you trying to do that didn't work?

      In an OCP cluster on version 4.15.2 with RHCOS nodes running on RHEL 9.2 with the following example configuration from nmstate nncp:

          interfaces:
          - ipv4:
              dhcp: false
              enabled: false
            link-aggregation:
              mode: active-backup
              port:
              - <physical-interface1>
              - <physical-interface2>
            name: bond1
            state: up
            type: bond
       
        desiredState:
          interfaces:
          - bridge:
              options:
                stp: true
              port:
              - name: bond1
            name: <bridge-name>
            state: up
            type: ovs-bridge
          ovn:
            bridge-mappings:
            - bridge: <bridge-name>
              localnet: nad-xxx1
              state: present
             
      
      

      During the device-reapply process, the physical interfaces configured in the bond1 are being detached from bond and flapping:

      Jul 15 12:04:07 xxx NetworkManager[3182]: <info>  [1721045047.5923] audit: op="device-reapply" interface="bond1" ifindex=2155 args="connection.master" pid=3245289 uid=0 result="fail" reason="Can't reapply changes to'connection.master' setting"                                                             
      Jul 15 12:04:07 xxx NetworkManager[3182]: <info>  [1721045047.5927] device (bond1): state change: activated -> deactivating (reason 'new-activation', sys-iface-state:'managed')                                                 
      Jul 15 12:04:07 xxx NetworkManager[3182]: <info>  [1721045047.5929] device (bond1): detaching ovs interface bond1 
      Jul 15 12:04:07 xxx kernel: bond1: (slave ens3xxx): Releasing backup interface                                                             
      Jul 15 12:04:07 xxx kernel: bond1: (slave ens3xxx): the permanent HWaddr of slave - xxxxxxx - is still in use by bond - set the HWaddr of slave to a different address to avoid conflicts   

      In the end of the process, the nnce configuration changes the status to failed to configure with the following message:

      "error reconciling NodeNetworkConfigurationPolicy on node xxx
            at desired state apply: \"\",\n failed to execute nmstatectl set --no-commit
            --timeout 480: 'exit status 1' '' '[2024-07-15T12:06:12Z INFO  nmstatectl] Nmstate
            version: 2.2.29\nUsing 'set' is deprecated, use 'apply' instead.\n[2024-07-15T12:06:12Z
      
           ignoring\n[2024-07-15T12:06:30Z INFO  nmstate::query_apply::net_state] Rollbacked
            to checkpoint /org/freedesktop/NetworkManager/Checkpoint/7\nNmstateError: VerificationError:
            Verification failure: <bridge-name>.interface.bridge.port desire '[{\"name\":\"bond1\"}]',
            current '[{\"name\":\"bond1\"},{\"name\":\"patch-nad.xxx2_ovn_localnet_port-to-br-int\"},{\"name\":\"patch-nad.xxx4_ovn_localnet_port-to-br-int\"}]'\n'" 

       The workaround configuration 'allow-extra-patch-ports: true' in the bridge section seems to resolve the issue.

      Please provide the package NVR for which bug is seen:

      How reproducible: Not easily. I have tried to reproduce in personal cluster, but no luck

      Steps to reproduce

      1.  
      2.  
      3.  

      Expected results

      Actual results

              fge@redhat.com Gris Ge
              rhn-support-bgomes Bruno Gomes
              Network Management Team Network Management Team
              Mingyu Shi Mingyu Shi
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated:
                Resolved: