-
Bug
-
Resolution: Can't Do
-
Major
-
rhel-9.4, rhel-9.5
-
No
-
Important
-
rhel-net-mgmt
-
ssg_networking
-
5
-
False
-
False
-
-
None
-
Red Hat Enterprise Linux
-
None
-
-
None
-
None
-
None
What were you trying to do that didn't work?
On RHEL 9.4, observe and manage HSR/PRP interfaces using nmstate, based on the sample manifest available upstream https://github.com/nmstate/nmstate/pull/2469#issue-2011996438 , doesn't works as expected/defined by PRP protocol. But everything works back as expected when managing the HSR interface with ifconfig https://lwn.net/Articles/826386/ or setting port 1 and port2 with mac address with nmstate before declaring the HSR interface.
When configured through nmstate and without setting port 1 and port 2 with same mac address, it is observed intermittently that the PRP interface is receiving on both ports but sometimes does not drop the duplicate packets[1], so duplicate messages make it to the application. After a wireshark/tcpdump inspection, it appears that the redundant messages (which are supposed to be identical) were using a different MAC address.
One of the steps in setup of PRP using standard commands is to set the MAC address of both interfaces to be the same, and as nmstate doesn't apply this configuration on hsr interfaces by default and it looks like that defining is the mac address of the ports is a required step.
Then it was evaluated the following options:
- "supervision-address" field, but at this time it is still read-only, https://github.com/nmstate/nmstate/commit/b23da648e49593c9919e11dcd9a65d3d423fe868#diff-d74df62b06b50e06e830190f130b2cd29f8336dae26d668ffd54edff8aaff512R57
[root@rhel94-local-prp1 ~]# head hsr0.yaml --- interfaces: - name: hsr0 type: hsr state: up hsr: port1: enp7s0 port2: enp8s0 supervision-address: 52:54:00:73:72:76 multicast-spec: 40 [root@rhel94-local-prp1 ~]# nmstatectl apply hsr0.yaml (..) [2025-01-17T15:19:43Z WARN nmstate::ifaces::hsr] The supervision-address is read-only, ignoring it on desired state.
- setting port 1 and port2 mac address with nmstate, seems to be a validate solution but it is not documented upstream or downstream within nmstate for HSR/PRP explicitly. The remaining problem is that it is still observed ~0.0909091% packet loss during failover when nodes are under high network bandwidth workload, which we are not sure if it is still a problem based on statements of "zero packet loss" about HSR/PRP protocol.
[root@rhel94-local-prp1 ~]# cat hsr0.yaml --- interfaces: - name: enp7s0 type: ethernet state: up mac-address: 52:54:00:18:6d:48 - name: enp8s0 type: ethernet state: up mac-address: 52:54:00:73:72:76 - name: hsr0 type: hsr state: up hsr: port1: enp7s0 port2: enp8s0 multicast-spec: 40 protocol: prp ipv4: enabled: true dhcp: false address: - ip: 192.168.200.10 prefix-length: 24 auto-dns: false auto-gateway: false auto-routes: false [root@rhel94-local-prp1 ~]# ip a l (..) 3: enp7s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:18:6d:48 brd ff:ff:ff:ff:ff:ff 4: enp8s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:18:6d:48 brd ff:ff:ff:ff:ff:ff permaddr 52:54:00:1e:ac:12 5: hsr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1494 qdisc noqueue state UP group default qlen 1000 link/ether 52:54:00:18:6d:48 brd ff:ff:ff:ff:ff:ff inet 192.168.200.20/24 brd 192.168.200.255 scope global noprefixroute hsr0 valid_lft forever preferred_lft forever
ICMP stats and iperf (VMs with 2 vCPUs and 4GB) stats running failover tests: --- 192.168.200.10 ping statistics --- 6600 packets transmitted, 6594 received, 0.0909091% packet loss, time 6757422ms rtt min/avg/max/mdev = 0.064/0.458/4.251/0.134 ms Accepted connection from 192.168.200.20, port 47564 [ 5] local 192.168.200.10 port 5201 connected to 192.168.200.20 port 47570 [ ID] Interval Transfer Bitrate [ 5] 0.00-5.00 sec 180 MBytes 302 Mbits/sec [ 5] 5.00-10.00 sec 195 MBytes 327 Mbits/sec [ 5] 10.00-15.00 sec 188 MBytes 315 Mbits/sec [ 5] 15.00-20.00 sec 183 MBytes 307 Mbits/sec [ 5] 20.00-25.00 sec 195 MBytes 328 Mbits/sec [ 5] 25.00-30.00 sec 194 MBytes 325 Mbits/sec [ 5] 30.00-35.00 sec 177 MBytes 298 Mbits/sec [ 5] 35.00-40.00 sec 486 MBytes 815 Mbits/sec [ 5] 40.00-45.00 sec 691 MBytes 1.16 Gbits/sec [ 5] 45.00-50.00 sec 700 MBytes 1.17 Gbits/sec [ 5] 50.00-55.01 sec 655 MBytes 1.10 Gbits/sec [ 5] 55.01-60.00 sec 636 MBytes 1.07 Gbits/sec [ 5] 60.00-60.04 sec 4.00 MBytes 789 Mbits/sec - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bitrate [ 5] 0.00-60.04 sec 4.38 GBytes 626 Mbits/sec receiver
[1] https://en.wikipedia.org/wiki/Parallel_Redundancy_Protocol & https://wiki.wireshark.org/PRP
What is the impact of this issue to you?
PRP is designed to provide zero-time recovery and allows to check the redundancy continuously to detect lurking failures.
At this moment is HSR/PRP still TP with RHEL 9.4, and we are looking to become GA for production grade deployments.
When setting port 1 and port2 with mac address through nmstate, seems to be a validate solution but it is not documented upstream or downstream within nmstate. So, a supportability review is needed to guide us with best practices.
Finally, we would like to understand why "supervision-address" is a read-only field at this moment and if does impact on the way that PRP works.
Please provide the package NVR for which the bug is seen:
[root@rhel94-local-prp1 ~]# cat /etc/redhat-release Red Hat Enterprise Linux release 9.4 (Plow) [root@rhel94-local-prp1 ~]# uname -a Linux rhel94-local-prp1 5.14.0-427.42.1.el9_4.x86_64 #1 SMP PREEMPT_DYNAMIC Fri Oct 18 14:35:40 EDT 2024 x86_64 x86_64 x86_64 GNU/Linux [root@rhel94-local-prp1 ~]# lsmod |grep hsr hsr 57344 0 [root@rhel94-local-prp1 ~]# dnf info nmstate Updating Subscription Management repositories. Last metadata expiration check: 1:18:33 ago on Wed 22 Jan 2025 12:08:00 PM WET. Installed Packages Name : nmstate Version : 2.2.39 Release : 1.el9_5 Architecture : x86_64 Size : 10 M Source : nmstate-2.2.39-1.el9_5.src.rpm Repository : @System From repo : rhel-9-for-x86_64-appstream-rpms Summary : Declarative network manager API URL : https://github.com/nmstate/nmstate
How reproducible is this bug?:
Always
Steps to reproduce
- sample manifest available upstream https://github.com/nmstate/nmstate/pull/2469#issue-2011996438
- Working manifest @ KB https://access.redhat.com/solutions/7103424
Expected results
PRP provides zero-time recovery and allows to check the redundancy continuously to detect lurking failures.
Actual results
- With current upstream sample manifests, nmstate seems to not be able to deliver the level of availability as expected/defined by PRP protocol.
- As suggested by https://access.redhat.com/solutions/7103424, before declaring the hsr interface, we are making sure that port1 and port2 are configured with the same MAC Address. This address is typically inherited from port1. See more at https://lwn.net/Articles/826386/ But even with this second config, the remaining problem is that it is still observed ~0.0909091% packet loss during failover when nodes are under high network bandwidth workload, which we are not sure if it is still a problem based on statements of "zero packet loss" about HSR/PRP protocol.
- is related to
-
RFE-4762 Support for Parallel Redundancy Protocol (PRP) and High-availability Seamless Redundancy (HSR)
-
- Closed
-