Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-17773

[OVN-IC] high cpu usage (600%) by OVNK during resource deletion at scale

    XMLWordPrintable

Details

    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      We see very high cpu usage (600%) by ovnkube-node pod during resource deletion while running cluster-density-v2 scale test on 120 node OVN-IC environment. ovnkube-controller and ovnkube-node containers are using 500% and 80% CPU respectively. 
      
      Cluster-density-v2 test when churn is enabled executes below steps in sequence
      1. Creates 800 namespace(each namespace with 10 pods, 5 services, 2 routes and 3 network policies)
      2. After creating all these resources, it deletes 10% of the namespaces (i.e 80 namespaces), waits till all namespaces are deleted
      3. Recreates them (i.e 80 namespaces with the resources)
      4. Idle for 10 minutes i.e no resource creation or deletion (CPU usage will be dropped to normal)
      5. Follow steps 2 to 4 (i.e deletion, creation and idle) for 1 hour duration. In total we will have 4 deletion, creation, idle events in the entire test run. 
      
      During resource deletion phase we see very high cpu usage by OVNKubernetes components
      1. Ovnkube-controller is using 500% cpu during deletion, while it uses only 20% during creation
      2. Ovnkube-node container 80% cpu at deletion vs 6% at creation
      
      Because of this Ovnkube-node pod in OVN-IC is using 600% cpu during deletion phase.
      
      As this issue is alwasy reproducible, I can share the environment when the engineer wants to debug.
      
      Grafana snapshots -
      https://grafana.rdu2.scalelab.redhat.com:3000/dashboard/snapshot/1UCvwcGt7gnJMr1NDLBOQCz3Fzdr1UqM
      https://grafana.rdu2.scalelab.redhat.com:3000/dashboard/snapshot/RCfK5Aec6lPul6oqMdmOredrbOWWEEFN
      https://grafana.rdu2.scalelab.redhat.com:3000/dashboard/snapshot/2Gi7ohTuW064xM0mRVy88uekbnN1N4Or

       

      Version-Release number of selected component (if applicable):

      4.14.0-0.nightly-2023-07-30-234232

      How reproducible:

      Run cluster-density-v2 with churn=true and iterations=800 on 120 node OVN-IC environment

      Attachments

        Issue Links

          Activity

            People

              trozet@redhat.com Tim Rozet
              vkommadi@redhat.com VENKATA ANIL kumar KOMMADDI
              VENKATA ANIL kumar KOMMADDI VENKATA ANIL kumar KOMMADDI
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: