Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-3119

network-diagnostics is using high amounts of CPU at regular intervals

XMLWordPrintable

    • Moderate
    • None
    • SDN Sprint 228, SDN Sprint 229, SDN Sprint 230
    • 3
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Network-check-source which part of the openshift-network-diagnostics to check the connectivity in the cluster is consuming high amounts of CPU ( ~1-1.5 cores ) at regular intervals ( ~8 mins ) ( screenshot of the prometheus metrics and logs attached ). This impacts the resources available especially in case of HyperShift Management cluster given that the worker nodes are used for Hosted cluster control plane pods and it's important to reduce the resource consumption to save on costs.
      
      Potential solution can be to reduce the interval at which the connectivity checks are run or reduce the CPU overhead if we can.  

      Version-Release number of selected component (if applicable):

      4.11 and 4.12

      How reproducible:

      Always

      Steps to Reproduce:

      1. Install HyperShift Management cluster managing 3 hosted clusters with 7 nodes each.
      2. Observe the CPU usage of network-check-source pod in openshift-network-diagnostics namepsace 

      Actual results:

      network-check-source pod is consuming high amounts of CPU at regular intervals

      Expected results:

      network-check-source pod consumes less CPU or reduce the interval for the checks

      Additional info:

      Logs: http://dell-r510-01.perf.lab.eng.rdu2.redhat.com/chaos/hypershift/network-diagnostics/

       

       

       

              pdiak@redhat.com Patryk Diak
              nelluri Naga Ravi Chaitanya Elluri
              Anurag Saxena Anurag Saxena
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: