Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-15544

[Reliability] regression: continuously memory increase on a ovnkube-node pod

XMLWordPrintable

    • Critical
    • No
    • SDN Sprint 238, SDN Sprint 239
    • 2
    • Approved
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      In Reliability (loaded longrun) test, the memory of ovnkube-node-xxx pods on all 6 nodes keep increasing. Within 24 hours, increased to about 1.6G. I did not see this issue in previous releases.

      Version-Release number of selected component (if applicable):

      4.14.0-0.nightly-2023-06-27-000502

      How reproducible:

      I met this issue the first time

      Steps to Reproduce:

      1. Install a AWS OVN cluster with 3 masters, 3 workers, vm_type are all m5.xlarge.
      2. Run reliability-v2 test https://github.com/openshift/svt/tree/master/reliability-v2 with config: 1 admin, 15 dev-test, 1 dev-prod. The test will long run the configured tasks.
      3. Monitor the test failures in and performance dashboard.
      
      Test failures slack notification: https://redhat-internal.slack.com/archives/C0266JJ4XM5/p1687944463913769
      
      Performance dashboard:http://dittybopper-dittybopper.apps.qili-414-haproxy.qe-lrc.devcluster.openshift.com/d/IgK5MW94z/openshift-performance?orgId=1&from=1687944452000&to=now&refresh=1h

      Actual results:

      The memory of ovnkube-node-xxx pods on all 6 nodes keep increasing.
      Within 24 hours, increased to about 1.6G.

      Expected results:

      The memory of ovnkube-node-xxx pods

      Additional info:

      % oc adm top pod -n openshift-ovn-kubernetes | grep node
      ovnkube-node-4t282     146m         1862Mi          
      ovnkube-node-9p462     41m          1847Mi          
      ovnkube-node-b6rqj     46m          2032Mi          
      ovnkube-node-fp2gn     72m          2107Mi          
      ovnkube-node-hxf95     11m          2359Mi          
      ovnkube-node-ql9fx     38m          2089Mi          

      I did a pprof heap on one of the pod and upload to heap-ovnkube-node-4t282.out
      Must-gather is uploaded to must-gather.local.1315176578017655774.tar.gz
      performance dashboard screenshot for ovnkube-node-memory.png

            npinaeva@redhat.com Nadia Pinaeva
            rhn-support-qili Qiujie Li
            Qiujie Li Qiujie Li
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: