Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-2116

Multiple Service monitor issues.

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • OVN
    • None
    • Multiple Service monitor issues.
    • 8
    • False
    • True
    • Hide

      Please mark each item below with ( / ) if completed or ( x ) if incomplete:

      ( ) The acceptance criteria defined below are met.

      Given an OVN deployment with service monitors configured for load balancer backends (TCP and UDP),

      When backend servers change state,

      Then service monitor status is reported accurately within the configured wait_time without excessive pinctrl wake-ups, lost notifications, or premature health check packets.


      ( ) The epics work is available in a downstream build (nightly/Async or other)


      ( ) All cards under the epic have been moved to Done

      Show
      Please mark each item below with ( / ) if completed or ( x ) if incomplete: ( ) The acceptance criteria defined below are met. Given an OVN deployment with service monitors configured for load balancer backends (TCP and UDP), When backend servers change state, Then service monitor status is reported accurately within the configured wait_time without excessive pinctrl wake-ups, lost notifications, or premature health check packets. ( ) The epics work is available in a downstream build (nightly/Async or other) ( ) All cards under the epic have been moved to Done
    • To Do
    • rhel-9
    • rhel-net-ovn
    • 100% To Do, 0% In Progress, 0% Done
    • ssg_networking

      There are multiple issues handing service monitor, mostly related to properly waking up pinctrl or ovn-controller main thread,.

      • pinctrl wakes up too often: when handling service monitor, pinctrl ends up sometimes looping/polling (multiple "wakeup due to 0-ms timeout at controller/pinctrl.c:8191")
      • Even when n_failure_count is set to 1, tcp service monitor is not always reported offline immediately after the expected wait_time, but only next time pinctrl is woken up, for any other reasons, adding random delays to reporting offline status. Same for udp service monitor going online.
      • Sometimes notification from pinctrl thread to main ovn-controller thread is "lost" and ovn-controller is not properly woken up. Another event is necessary to wake up ovn-controller, potentially delaying service monitor status by up to 30 seconds.  This is due to how seq_read/seq_wait is used.
      • Sometimes health check packets are lost because such packets are sent before some (e.g. load balancer related) flows are installed, delaying service reported online until next packets are sent.

      Most of those issues are service monitor related only.

      ovn-controller not waking up might affect other pinctrl related services.

              xsimonar@redhat.com Xavier Simonart
              xsimonar@redhat.com Xavier Simonart
              OVN QE OVN QE
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: