Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-17773

[OVN-IC] high cpu usage (600%) by OVNK during resource deletion at scale

XMLWordPrintable

    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      We see very high cpu usage (600%) by ovnkube-node pod during resource deletion while running cluster-density-v2 scale test on 120 node OVN-IC environment. ovnkube-controller and ovnkube-node containers are using 500% and 80% CPU respectively. 
      
      Cluster-density-v2 test when churn is enabled executes below steps in sequence
      1. Creates 800 namespace(each namespace with 10 pods, 5 services, 2 routes and 3 network policies)
      2. After creating all these resources, it deletes 10% of the namespaces (i.e 80 namespaces), waits till all namespaces are deleted
      3. Recreates them (i.e 80 namespaces with the resources)
      4. Idle for 10 minutes i.e no resource creation or deletion (CPU usage will be dropped to normal)
      5. Follow steps 2 to 4 (i.e deletion, creation and idle) for 1 hour duration. In total we will have 4 deletion, creation, idle events in the entire test run. 
      
      During resource deletion phase we see very high cpu usage by OVNKubernetes components
      1. Ovnkube-controller is using 500% cpu during deletion, while it uses only 20% during creation
      2. Ovnkube-node container 80% cpu at deletion vs 6% at creation
      
      Because of this Ovnkube-node pod in OVN-IC is using 600% cpu during deletion phase.
      
      As this issue is alwasy reproducible, I can share the environment when the engineer wants to debug.
      
      Grafana snapshots -
      https://grafana.rdu2.scalelab.redhat.com:3000/dashboard/snapshot/1UCvwcGt7gnJMr1NDLBOQCz3Fzdr1UqM
      https://grafana.rdu2.scalelab.redhat.com:3000/dashboard/snapshot/RCfK5Aec6lPul6oqMdmOredrbOWWEEFN
      https://grafana.rdu2.scalelab.redhat.com:3000/dashboard/snapshot/2Gi7ohTuW064xM0mRVy88uekbnN1N4Or

       

      Version-Release number of selected component (if applicable):

      4.14.0-0.nightly-2023-07-30-234232

      How reproducible:

      Run cluster-density-v2 with churn=true and iterations=800 on 120 node OVN-IC environment

            trozet@redhat.com Tim Rozet
            vkommadi@redhat.com VENKATA ANIL kumar KOMMADDI
            VENKATA ANIL kumar KOMMADDI VENKATA ANIL kumar KOMMADDI
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: