Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-50517

[GS bonding]mode=balance-xor,balance-slb=1,xmit_hash_policy=vlan+srcmac bond dropped gratuitous arp

Linking RHIVOS CVEs to...Migration: Automation ...Sync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Normal Normal
    • None
    • rhel-9.5
    • NetworkManager
    • Yes
    • Moderate
    • rhel-net-mgmt
    • ssg_networking
    • None
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • x86_64
    • None

      What were you trying to do that didn't work?

      Gold Sachs requested us implemented the source-load-balancing bonding mode.
      Here is the related doc: https://docs.google.com/document/d/1EeU3vXlA6ICgU6BHXKlBcgrRiDJpYVuXiRShPHf7s-k/edit
      We have implemented this using NM+NFT solution.
      And I created some cases to test it.
      On rhel-9.5, one case failed which was passed on rhel-9.4.

      The failure indicate our bonding doesn't follow the third rule which is described in doc https://docs.openvswitch.org/en/latest/topics/bonding/#slb-bonding ,

      ```
      Suppose that a MAC+VLAN moves to an SLB bond from another port (e.g. when a VM is migrated from this hypervisor to a different one). Without additional special handling, Open vSwitch will not notice until the MAC learning entry expires, up to 60 seconds later as a consequence of rule #2.

      Open vSwitch avoids a 60-second delay by listening for gratuitous ARPs, which VMs commonly emit upon migration. As an exception to rule #2, a gratuitous ARP received on an SLB bond is not dropped and updates the MAC learning table in the usual way. (If a move does not trigger a gratuitous ARP, or if the gratuitous ARP is lost in the network, then a 60-second delay still occurs.)
      ```

      simple to say, bond dropped gratuitous arp when should not.

      Please provide the package NVR for which bug is seen:

      This issue doesn't happen on rhel9.4 with following software versions.
      [root@dell-per760-01 virtual-networking]# rpm -q NetworkManager
      NetworkManager-1.46.0-8.el9_4.x86_64
      [root@dell-per760-01 virtual-networking]# uname -r
      5.14.0-427.26.1.el9_4.x86_64

      And it happen on rhel-9.5 with following software version.
      [root@netqe-amd-02 virtual-networking]# rpm -q NetworkManager
      NetworkManager-1.48.2-2.el9.x86_64
      [root@netqe-amd-02 virtual-networking]# uname -r
      5.14.0-480.el9.x86_64

      How reproducible:

      always

      Steps to reproduce

      1. dnf -y install tcpdump git wget python3-pip
      2. pip install scapy
      3. git clone https://gitlab.com/liali666/virtual-networking.git
      4. cd virtual-networking
      5. TEST_SETUP=nft_nm RESTART_NM=yes ./runtests tests/test-0014-ovs-rule3

      here is the test topo: https://gitlab.com/liali666/virtual-networking/-/blob/master/create-virtual-topo-no-leaf-spine-multi-vlan.sh

      Expected results

      Actual results

              nm-team Network Management Team
              rhn-support-liali Liang Li
              Network Management Team Network Management Team
              Vladimir Benes Vladimir Benes
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: