-
Bug
-
Resolution: Done
-
Blocker
-
None
-
rhos-17.1.3
The ovn cluster falls apart frequently and then shuffles around the routers. Every time that happens we experience downtime for the customers.
The customer faced issues multiple times this week, and currently has a complete connectivity loss for tenants.
First, we thought that the problem was caused by the rapid growth of the
MAC_Binding table and we suggested to following command:
~~
# ovn-nbctl set logical_router <LR> options:mac_binding_age_threshold=300
~~~
But today cu. had another occurrence of the issue, mac aging seems to work
but doesn't seems the root cause:
""Monitoring says we were at 137 MAC_Bindings and ~50k Logical Flows at that point in time.""
OVN logs shows many recalculation, many poll operations that took long
and high CPU usage.