Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-50543

[GS bonding]mode=balance-xor,balance-slb=1,xmit_hash_policy=vlan+srcmac bond dropped gratuitous arp

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Yes
    • Moderate
    • rhel-net-mgmt
    • ssg_networking
    • 0
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • Hide

      Given a system administrator is using NetworkManager with a bond interface configured in balance-xor mode with balance-slb=1 and xmit_hash_policy=vlan+srcmac on RHEL 10.0 or RHEL-9.5,

      When a gratuitous ARP is received on the SLB bond interface,

      Then, the gratuitous ARP must not be dropped, and the MAC learning table should be updated immediately as described in the Open vSwitch SLB bonding rule and ensure that the bonding behaves consistently with the documented SLB bonding behavior without causing a 60-second delay.

      Definition of Done:

      • The implementation meets the acceptance criteria
      • Integration tests are written and pass
      • The fix is part of a downstream build attached to an errata
      • The fix is backported into RHEL-9.5 
      Show
      Given a system administrator is using NetworkManager with a bond interface configured in balance-xor mode with balance-slb=1 and xmit_hash_policy=vlan+srcmac on RHEL 10.0 or RHEL-9.5, When a gratuitous ARP is received on the SLB bond interface, Then, the gratuitous ARP must not be dropped, and the MAC learning table should be updated immediately as described in the Open vSwitch SLB bonding rule and ensure that the bonding behaves consistently with the documented SLB bonding behavior without causing a 60-second delay. Definition of Done: The implementation meets the acceptance criteria Integration tests are written and pass The fix is part of a downstream build attached to an errata The fix is backported into RHEL-9.5 
    • None
    • None
    • x86_64
    • None

      What were you trying to do that didn't work?

      Gold Sachs requested us implemented the source-load-balancing bonding mode.
      Here is the related doc: https://docs.google.com/document/d/1EeU3vXlA6ICgU6BHXKlBcgrRiDJpYVuXiRShPHf7s-k/edit
      We have implemented this using NM+NFT solution.
      And I created some cases to test it.
      On rhel-10.0, one case failed which was passed on rhel-9.4.

      The failure indicate our bonding doesn't follow the third rule which is described in doc https://docs.openvswitch.org/en/latest/topics/bonding/#slb-bonding ,

      ```
      Suppose that a MAC+VLAN moves to an SLB bond from another port (e.g. when a VM is migrated from this hypervisor to a different one). Without additional special handling, Open vSwitch will not notice until the MAC learning entry expires, up to 60 seconds later as a consequence of rule #2.

      Open vSwitch avoids a 60-second delay by listening for gratuitous ARPs, which VMs commonly emit upon migration. As an exception to rule #2, a gratuitous ARP received on an SLB bond is not dropped and updates the MAC learning table in the usual way. (If a move does not trigger a gratuitous ARP, or if the gratuitous ARP is lost in the network, then a 60-second delay still occurs.)
      ```

      simple to say, bond dropped gratuitous arp when should not.

      Please provide the package NVR for which bug is seen:

      This issue doesn't happen on rhel9.4 with following software versions.
      [root@dell-per760-01 virtual-networking]# rpm -q NetworkManager
      NetworkManager-1.46.0-8.el9_4.x86_64
      [root@dell-per760-01 virtual-networking]# uname -r
      5.14.0-427.26.1.el9_4.x86_64

      And it happen on rhel-10.0 with following software version.
      [root@netqe-amd-02 virtual-networking]# rpm -q NetworkManager
      NetworkManager-1.48.4-1.el10.1.x86_64
      [root@netqe-amd-02 virtual-networking]# uname -r
      6.10.0-15.el10.x86_64

      How reproducible:

      always

      Steps to reproduce

      1. dnf -y install tcpdump git wget python3-pip
      2. pip install scapy
      3. git clone https://gitlab.com/liali666/virtual-networking.git
      4. cd virtual-networking
      5. TEST_SETUP=nft_nm RESTART_NM=yes ./runtests tests/test-0014-ovs-rule3

      here is the test topo: https://gitlab.com/liali666/virtual-networking/-/blob/master/create-virtual-topo-no-leaf-spine-multi-vlan.sh

      Expected results

      Actual results

              rh-ee-sfaye Stanislas Faye
              rhn-support-liali Liang Li
              Network Management Team Network Management Team
              Vladimir Benes Vladimir Benes
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: