-
Task
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
False
-
-
False
-
-
rhel-9
-
None
-
rhel-net-ovn
-
-
-
ssg_networking
This task is tracking the test case writing activities to cover the bug described below.
There are multiple issues handing service monitor, mostly related to properly waking up pinctrl or ovn-controller main thread,.
pinctrl wakes up too often: when handling service monitor, pinctrl ends up sometimes looping/polling (multiple "wakeup due to 0-ms timeout at controller/pinctrl.c:8191")
Even when n_failure_count is set to 1, tcp service monitor is not always reported offline immediately after the expected wait_time, but only next time pinctrl is woken up, for any other reasons, adding random delays to reporting offline status. Same for udp service monitor going online.
Sometimes notification from pinctrl thread to main ovn-controller thread is "lost" and ovn-controller is not properly woken up. Another event is necessary to wake up ovn-controller, potentially delaying service monitor status by up to 30 seconds. This is due to how seq_read/seq_wait is used.
Sometimes health check packets are lost because such packets are sent before some (e.g. load balancer related) flows are installed, delaying service reported online until next packets are sent.
Most of those issues are service monitor related only.
ovn-controller not waking up might affect other pinctrl related services.