-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.16.z
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
During troubleshooting of a customer case who faced an network packet storm issues post OpenshiftSDN to OVN-kubernetes migration we found out that when an application hosted on any node of the cluster sends multicast traffic to a multicast Microsoft LB causing an excessive flood of packets in the network which later is causing a lot of issues on all hosts in the same network.
The issue seems to be mitigated after setting on br-ex OVS bridge the below parameters:
sudo ovs-vsctl set Bridge br-ex mcast_snooping_enable=true sudo ovs-vsctl set Bridge br-ex other_config:mcast-snooping-disable-flood-unregistered=true
Multicast traffic seems to be looping between the switch (Where nodes are attached) and the br-ex interfaces of the nodes causing packet storm.
From our analysis so far this seems to be happening because the br-ex OVS bridge does not aknowledge the multicast packets and floods them out of all ports.
Then the switch is doing the same due to the destination MAC being a multicast one. This is happening in a loop and this issue happens.
The questions here are the below:
- Is this a bug with our product or these OVS configs are set like this for a reason?
- Is it safe to have this workaround by manually enable them on br-ex default interface.
- What could be next actions to mitigate this one the future?
Version-Release number of selected component (if applicable):
This is seen in OCP v4.16 but im sure it affects other versions.
How reproducible:
If a packet with destination MAC address a mutlicast one and destination IP address a unicast one "the IP of the Microsoft NLB" is transmitted from the application pod out to the network will reproduce the issue.
Actual results:
Multicast traffic neither being aknowledged by the br-ex OVS bridge or rejected. Instead it gets retransmitted out of all ports.
Expected results:
Multicast traffic should be either being aknowledged by the br-ex OVS bridge or rejected. Not retrasmitted out of all ports.
Additional info:
We have TCPdumps of the issue in the case attached and i have also created graphs where the issue is visible.
For now the customer has applied a machineconfig to set the below OVS configuration persistently on all nodes.
mcast_snooping_enable=true mcast-snooping-disable-flood-unregistered=true
The data are in the case and can be accessed through supportShell but let me know if you need anything else.