Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-85769

[rhel-9.7] [nmstate] PRP interface is receiving on both ports but does not drop the duplicate packets

Linking RHIVOS CVEs to...Migration: Automation ...Sync from "Extern...XMLWordPrintable

    • No
    • Important
    • rhel-net-mgmt
    • ssg_networking
    • 5
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • Red Hat Enterprise Linux
    • None
    • Hide

      Definition of Done:

      Please mark each item below with ( / ) if completed or ( x ) if incomplete:

      ( ) The acceptance criteria defined below are met.

      Given two physical interfaces (port1 and port2) being configured for PRP through nmstate,

      When the system administrator applies a valid configuration ensuring both interfaces share the same MAC address,

      Then PRP should consistently drop duplicate packets and achieve near zero packet loss during failover tests under typical load conditions.

      Definition of Done:

      • The implementation meets the acceptance criteria
      • Integration tests are written and pass 
      • The official Red Hat documentation and nmstate upstream documentations are updated to clarify that ports must have the same MAC, if `supervision-address` is required for fine-tuning, it's made configurable or documented. 

      ( ) Code changes are included in a downstream build attached to an errata.


      ( ) All required testing (manual and/or automated) passes successfully.


      ( ) Related documentation updates (if applicable) have been completed.

      Show
      Definition of Done: Please mark each item below with ( / ) if completed or ( x ) if incomplete: ( ) The acceptance criteria defined below are met. Given two physical interfaces (port1 and port2) being configured for PRP through nmstate, When the system administrator applies a valid configuration ensuring both interfaces share the same MAC address, Then PRP should consistently drop duplicate packets and achieve near zero packet loss during failover tests under typical load conditions. Definition of Done: The implementation meets the acceptance criteria Integration tests are written and pass  The official Red Hat documentation and nmstate upstream documentations are updated to clarify that ports must have the same MAC, if `supervision-address` is required for fine-tuning, it's made configurable or documented.  ( ) Code changes are included in a downstream build attached to an errata. ( ) All required testing (manual and/or automated) passes successfully. ( ) Related documentation updates (if applicable) have been completed.
    • None
    • None
    • None

      What were you trying to do that didn't work?

      On RHEL 9.4, observe and manage HSR/PRP interfaces using nmstate, based on the sample manifest available upstream https://github.com/nmstate/nmstate/pull/2469#issue-2011996438 , doesn't works as expected/defined by PRP protocol. But everything works back as expected when managing the HSR interface with ifconfig https://lwn.net/Articles/826386/ or setting port 1 and port2 with mac address with nmstate before declaring the HSR interface.

      When configured through nmstate and without setting port 1 and port 2 with same mac address, it is observed intermittently that the PRP interface is receiving on both ports but sometimes does not drop the duplicate packets[1], so duplicate messages make it to the application.  After a wireshark/tcpdump inspection, it appears that the redundant messages (which are supposed to be identical) were using a different MAC address.

      One of the steps in setup of PRP using standard commands is to set the MAC address of both interfaces to be the same, and as nmstate doesn't apply this configuration on hsr interfaces by default and it looks like that defining is the mac address of the ports is a required step.  

      Then it was evaluated the following options: 

       

      [root@rhel94-local-prp1 ~]# head hsr0.yaml
      ---
      interfaces:
        - name: hsr0
          type: hsr
          state: up
          hsr:
            port1: enp7s0
            port2: enp8s0
            supervision-address: 52:54:00:73:72:76
            multicast-spec: 40
      [root@rhel94-local-prp1 ~]# nmstatectl apply hsr0.yaml
      (..)
      [2025-01-17T15:19:43Z WARN  nmstate::ifaces::hsr] The supervision-address is read-only, ignoring it on desired state.
       
      
      • setting port 1 and port2 mac address with nmstate, seems to be a validate solution but it is not documented upstream or downstream within nmstate for HSR/PRP explicitly. The remaining problem is that it is still observed ~0.0909091% packet loss during failover when nodes are under high network bandwidth workload, which we are not sure if it is still a problem based on statements of "zero packet loss" about HSR/PRP protocol. 
        [root@rhel94-local-prp1 ~]# cat hsr0.yaml
        ---
        interfaces:
          - name: enp7s0
            type: ethernet
            state: up
            mac-address: 52:54:00:18:6d:48
           - name: enp8s0
            type: ethernet
            state: up
            mac-address: 52:54:00:73:72:76 
          - name: hsr0
            type: hsr
            state: up
            hsr:
              port1: enp7s0
              port2: enp8s0
              multicast-spec: 40
              protocol: prp
            ipv4:
              enabled: true
              dhcp: false
              address:
              - ip: 192.168.200.10
                prefix-length: 24
              auto-dns: false
              auto-gateway: false
              auto-routes: false
        [root@rhel94-local-prp1 ~]# ip a l
        (..) 
        3: enp7s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
            link/ether 52:54:00:18:6d:48 brd ff:ff:ff:ff:ff:ff
        4: enp8s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
            link/ether 52:54:00:18:6d:48 brd ff:ff:ff:ff:ff:ff permaddr 52:54:00:1e:ac:12
        5: hsr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1494 qdisc noqueue state UP group default qlen 1000
            link/ether 52:54:00:18:6d:48 brd ff:ff:ff:ff:ff:ff
            inet 192.168.200.20/24 brd 192.168.200.255 scope global noprefixroute hsr0
               valid_lft forever preferred_lft forever
      • ICMP stats and iperf (VMs with 2 vCPUs and 4GB) stats running failover tests: 
        
        --- 192.168.200.10 ping statistics ---
        6600 packets transmitted, 6594 received, 0.0909091% packet loss, time 6757422ms
        rtt min/avg/max/mdev = 0.064/0.458/4.251/0.134 ms
        
        Accepted connection from 192.168.200.20, port 47564
        [  5] local 192.168.200.10 port 5201 connected to 192.168.200.20 port 47570
        [ ID] Interval           Transfer     Bitrate
        [  5]   0.00-5.00   sec   180 MBytes   302 Mbits/sec                  
        [  5]   5.00-10.00  sec   195 MBytes   327 Mbits/sec                  
        [  5]  10.00-15.00  sec   188 MBytes   315 Mbits/sec                  
        [  5]  15.00-20.00  sec   183 MBytes   307 Mbits/sec                  
        [  5]  20.00-25.00  sec   195 MBytes   328 Mbits/sec                  
        [  5]  25.00-30.00  sec   194 MBytes   325 Mbits/sec                  
        [  5]  30.00-35.00  sec   177 MBytes   298 Mbits/sec                  
        [  5]  35.00-40.00  sec   486 MBytes   815 Mbits/sec                  
        [  5]  40.00-45.00  sec   691 MBytes  1.16 Gbits/sec                  
        [  5]  45.00-50.00  sec   700 MBytes  1.17 Gbits/sec                  
        [  5]  50.00-55.01  sec   655 MBytes  1.10 Gbits/sec                  
        [  5]  55.01-60.00  sec   636 MBytes  1.07 Gbits/sec                  
        [  5]  60.00-60.04  sec  4.00 MBytes   789 Mbits/sec                  
        - - - - - - - - - - - - - - - - - - - - - - - - -
        [ ID] Interval           Transfer     Bitrate
        [  5]   0.00-60.04  sec  4.38 GBytes   626 Mbits/sec                  receiver
         

      [1] https://en.wikipedia.org/wiki/Parallel_Redundancy_Protocol & https://wiki.wireshark.org/PRP 

      What is the impact of this issue to you?

      PRP is designed to provide zero-time recovery and allows to check the redundancy continuously to detect lurking failures.
      At this moment is HSR/PRP still TP with RHEL 9.4, and we are looking to become GA for production grade deployments. 

      When setting port 1 and port2  with mac address through nmstate, seems to be a validate solution but it is not documented upstream or downstream within nmstate. So, a supportability review is needed to guide us with best practices.

      Finally, we would like to understand why "supervision-address" is a read-only field at this moment and if does impact on the way that PRP works.  

      Please provide the package NVR for which the bug is seen:

      [root@rhel94-local-prp1 ~]# cat /etc/redhat-release 
      Red Hat Enterprise Linux release 9.4 (Plow)
      [root@rhel94-local-prp1 ~]# uname -a
      Linux rhel94-local-prp1 5.14.0-427.42.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Oct 18 14:35:40 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux
      [root@rhel94-local-prp1 ~]# lsmod |grep hsr
      hsr                    57344  0 [root@rhel94-local-prp1 ~]# dnf info nmstate
      Updating Subscription Management repositories.
      Last metadata expiration check: 1:18:33 ago on Wed 22 Jan 2025 12:08:00 PM WET.
      Installed Packages
      Name         : nmstate
      Version      : 2.2.39
      Release      : 1.el9_5
      Architecture : x86_64
      Size         : 10 M
      Source       : nmstate-2.2.39-1.el9_5.src.rpm
      Repository   : @System
      From repo    : rhel-9-for-x86_64-appstream-rpms
      Summary      : Declarative network manager API
      URL          : https://github.com/nmstate/nmstate
       

      How reproducible is this bug?:

      Always

      Steps to reproduce

      1. sample manifest available upstream https://github.com/nmstate/nmstate/pull/2469#issue-2011996438 
      2. Working manifest @ KB https://access.redhat.com/solutions/7103424  
      3.  

      Expected results

      PRP provides zero-time recovery and allows to check the redundancy continuously to detect lurking failures. 

      Actual results

      • With current upstream sample manifests, nmstate seems to not be able to deliver the level of availability as expected/defined by PRP protocol.
      • As suggested by https://access.redhat.com/solutions/7103424, before declaring the hsr interface, we are making sure that port1 and port2 are configured with the same MAC Address. This address is typically inherited from port1. See more at https://lwn.net/Articles/826386/ But even with this second config, the remaining problem is that it is still observed ~0.0909091% packet loss during failover when nodes are under high network bandwidth workload, which we are not sure if it is still a problem based on statements of "zero packet loss" about HSR/PRP protocol. 

       

              rh-ee-sfaye Stanislas Faye
              rhn-support-arolivei Arthur Oliveira
              Network Management Team Network Management Team
              Vladimir Benes Vladimir Benes
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: