-
Bug
-
Resolution: Done-Errata
-
Major
-
None
-
None
This ticket is tracking the QE verification effort for the solution to the problem described below.
Problem Description: Clearly explain the issue.
Since https://github.com/ovn-org/ovn/commit/325c7b2 ovn-controller splits openflows generated for multicast groups (IP multicast but also MC_FLOOD, MC_UNKNOWN, etc) into chains of rules essentially interleaving a controller() action in between other actions if the total length of the rule action would take more than MC_OFPACTS_MAX_MSG_SIZE otherwise.
This has the unwanted side effect of flooding the controller with these "controller recirculated" packets generating more harm than if the packet would be dropped due to a too large datapath flow action list.
In the meantime it has been determined that the underlying problem that caused ovs-vswitchd generated datapath flows to have too large action lists was a actually kernel bug (RHEL-83440).
Now that the original problem has been fixed in the kernel we should probably revert the OVN commit to avoid the controller DoSing itself.
This issue has been reported in a couple of cases already:
https://mail.openvswitch.org/pipermail/ovs-discuss/2025-February/053455.html
https://issues.redhat.com/browse/OCPBUGS-61000
Because we still have (at least) layered products using OVN on RHEL 9.2 we need the OVN revert to happen after the kernel fix is ported to RHEL 9.2, tracked in RHEL-87209.
Impact Assessment: Describe the severity and impact (e.g., network down,availability of a workaround, etc.).
Network impact: high CPU usage for ovs-vswitch, ovn-controller.
Software Versions: Specify the exact versions in use (e.g.,openvswitch3.1-3.1.0-147.el8fdp).
ovn24.03-24.03.6-26.el9fdp (actually any supported OVN stream)
Issue Type: Indicate whether this is a new issue or a regression (if a regression, state the last known working version).
Regression introduced by https://github.com/ovn-org/ovn/commit/325c7b2
Reproducibility: Confirm if the issue can be reproduced consistently. If not, describe how often it occurs.
Constant
Reproduction Steps: Provide detailed steps or scripts to replicate the issue.
Setup an OVN topology with a reasonably large number of logical switch ports on a given switch such that the resulting openflows for the corresponding switch multicast groups are split into chains.
Send multicast traffic.
Expected Behavior: Describe what should happen under normal circumstances.
Traffic should be forwarded to the destinations (at least until the number of destinations doesn't cause the maximum OVS resubmit limit to be hit).
Observed Behavior: Explain what actually happens.
All multicast packets are forwarded through a chain of controller action => high cpu for both ovs-vswitchd and ovn-controller and network impact.
Troubleshooting Actions: Outline the steps taken to diagnose or resolve the issue so far.
Logs: If you collected logs please provide them (e.g. sos report, /var/log/openvswitch/* , testpmd console)
- links to
-
RHBA-2025:155230
ovn24.03 bug fix and enhancement update