Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-2657

100% CPU in northd and ovn-ic when using Service Monitoring

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • OVN
    • None
    • 100% CPU in northd and ovn-ic when using Service Monitoring
    • 8
    • False
    • False
    • Hide

      Please mark each item below with ( / ) if completed or ( x ) if incomplete:

      ( ) The acceptance criteria defined below are met.

      Given a multi-AZ OVN deployment with ovn-ic enabled and service monitoring configured as in test 'Service Monitor synchronization' in ovn-ic.at,

      When the test runs to completion and executes 'grep -c Service_Monitor $az/ovn-sb/ovn-sb.db',

      Then, the count returns approximately 20 records instead of ~2000, and the count remains stable over a 30-second observation period with no continuous growth.


      Given a running OVN interconnect deployment with service monitoring enabled across 2 availability zones, each with at least one load balancer with health checks,

      When system runs for 60 seconds after initial convergence (all Service_Monitor records created),

      Then, no high CPU usage is seen for ovn-northd and ovn-ic.

      ( ) The epics work is available in a downstream build (nightly/Async or other)


      ( ) All cards under the epic have been moved to Done

      Show
      Please mark each item below with ( / ) if completed or ( x ) if incomplete: ( ) The acceptance criteria defined below are met. Given a multi-AZ OVN deployment with ovn-ic enabled and service monitoring configured as in test 'Service Monitor synchronization' in ovn-ic.at, When the test runs to completion and executes 'grep -c Service_Monitor $az/ovn-sb/ovn-sb.db', Then, the count returns approximately 20 records instead of ~2000, and the count remains stable over a 30-second observation period with no continuous growth. – Given a running OVN interconnect deployment with service monitoring enabled across 2 availability zones, each with at least one load balancer with health checks, When system runs for 60 seconds after initial convergence (all Service_Monitor records created), Then, no high CPU usage is seen for ovn-northd and ovn-ic. ( ) The epics work is available in a downstream build (nightly/Async or other) ( ) All cards under the epic have been moved to Done
    • rhel-9
    • rhel-net-ovn
    • 100% To Do, 0% In Progress, 0% Done
    • ssg_networking

      This epic tracks all the effort needed to deliver the solution related to the bug described below.

       Problem Description: Clearly explain the issue.

      When using Service Monitoring and ovn-ic, both ovn-northd and ovn-ic start using 100% CPU. ovsdb-server (SB) also starts using high CPU.

       Impact Assessment: Describe the severity and impact (e.g., network down,availability of a workaround, etc.).

      Unnecessary CPU usage when ic and Service Monitor are used together

       Software Versions: Specify the exact versions in use (e.g.,openvswitch3.1-3.1.0-147.el8fdp).

      Upstream main

        Issue Type: Indicate whether this is a new issue or a regression (if a regression, state the last known working version).

      Introduced by "ic: Implement cross-AZ service monitor synchronization."

       Reproducibility: Confirm if the issue can be reproduced consistently. If not, describe how often it occurs.

      This can be reproduced within test "Service Monitor synchronization" in ovn-ic.at. 

       Reproduction Steps: Provide detailed steps or scripts to replicate the issue.

      Within test "Service Monitor synchronization" in ovn-ic.at, add following at the end of the test:

      grep -c Service_Monitor $az/ovn-sb/ovn-sb.db

      if there was no issue, it should be low (e.g. 20). But it has been seen as around 2000.

      sb db shows Service_Monitor being updated over and over by ic and northd.

       

       Expected Behavior: Describe what should happen under normal circumstances.

      ovn-northd and ovn-ic should not fight to write in SB.

       Observed Behavior: Explain what actually happens.

      ovn-northd and ovn-ic fight to write in SB.

       Troubleshooting Actions: Outline the steps taken to diagnose or resolve the issue so far.

      Run top while test is running, and see ovn-northd and ovn-ic using high CPU usage.

      Both ovn-ic and ovn-northod are trying to update Service_Monitor options az-name, but their updates are differing by the _comment.

       Logs: If you collected logs please provide them (e.g. sos report, /var/log/openvswitch/* , testpmd console)

              ovnteam@redhat.com OVN Team
              xsimonar@redhat.com Xavier Simonart
              OVN QE OVN QE
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: