Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-1078

[OCP 4.12] ovn-controller memory usage is extremely high and OOM killer occurs

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Undefined Undefined
    • None
    • None
    • ovn22.12
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • rhel-8
    • None
    • rhel-net-ovn
    • ssg_networking

       Problem Description: Clearly explain the issue.

      Memory usage of ovn-controller on some worker nodes is extremely high and it causes OOM Killer repeatedly.

       

      # oc adm top pods --containers -n openshift-ovn-kubernetesoc adm top pods --containers -n openshift-ovn-kubernetes
      POD                    NAME                          CPU(cores)   MEMORY(bytes)   
        :
      ovnkube-node-aaaaa     kube-rbac-proxy               0m           48Mi            
      ovnkube-node-aaaaa     kube-rbac-proxy-ovn-metrics   0m           50Mi            
      ovnkube-node-aaaaa     ovn-acl-logging               0m           2Mi             
      ovnkube-node-aaaaa     ovn-controller                992m         113667Mi        
      ovnkube-node-aaaaa     ovnkube-node                  122m         102Mi           
      ovnkube-node-bbbbb     kube-rbac-proxy               0m           47Mi            
      ovnkube-node-bbbbb     kube-rbac-proxy-ovn-metrics   0m           48Mi            
      ovnkube-node-bbbbb     ovn-acl-logging               0m           2Mi             
      ovnkube-node-bbbbb     ovn-controller                993m         114988Mi        
      ovnkube-node-bbbbb     ovnkube-node                  120m         100Mi           
        :

       

      # oc adm node-logs -l kubernetes.io/hostname=<node_name> --path='journal' | grep "Out of memory"
      
      kernel: Out of memory: Killed process 801747 (td-agent) total-vm:623012kB, anon-rss:55908kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:508kB oom_score_adj:998
      kernel: Out of memory: Killed process 801659 (fluentd-entrypo) total-vm:12068kB, anon-rss:360kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:68kB oom_score_adj:998
      kernel: Out of memory: Killed process 829234 (docker-entrypoi) total-vm:12212kB, anon-rss:572kB, file-rss:4kB, shmem-rss:0kB, UID:1000 pgtables:68kB oom_score_adj:995
      kernel: Out of memory: Killed process 835139 (sh) total-vm:12080kB, anon-rss:476kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:72kB oom_score_adj:995
      kernel: Out of memory: Killed process 800413 (ovs-vswitchd) total-vm:4414620kB, anon-rss:278016kB, file-rss:41924kB, shmem-rss:0kB, UID:800 pgtables:1008kB oom_score_adj:0
      kernel: Out of memory: Killed process 800497 (NetworkManager) total-vm:393344kB, anon-rss:6944kB, file-rss:2356kB, shmem-rss:0kB, UID:0 pgtables:376kB oom_score_adj:0
      kernel: Out of memory: Killed process 852934 (systemd-udevd) total-vm:104272kB, anon-rss:4016kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:188kB oom_score_adj:0
      kernel: Out of memory: Killed process 852926 (systemd-udevd) total-vm:104272kB, anon-rss:4016kB, file-rss:0kB, shmem-rss:0kB, UID:0 pgtables:188kB oom_score_adj:0

       

       

       Impact Assessment: Describe the severity and impact (e.g., network down,availability of a workaround, etc.).

      The high memory usage of ovn-controller causes OOM killer of other many processes and worker nodes is flapping between Ready and NotReady due to the OOM Killler.

       Software Versions: Specify the exact versions in use (e.g.,openvswitch3.1-3.1.0-147.el8fdp).

      ovn22.12-22.12.0-18.el8fdp.x86_64
      ovn22.12-central-22.12.0-18.el8fdp.x86_64
      ovn22.12-vtep-22.12.0-18.el8fdp.x86_64
      ovn22.12-host-22.12.0-18.el8fdp.x86_64

        Issue Type: Indicate whether this is a new issue or a regression (if a regression, state the last known working version).

      Not sure.

       Reproducibility: Confirm if the issue can be reproduced consistently. If not, describe how often it occurs.

      Not sure.
      This high memory usage issue started happening suddenly one day on some worker nodes.
      We tried restarting ovn-controller and worker nodes, but its memory usage increase soon and causes OOM Killer again.

       Reproduction Steps: Provide detailed steps or scripts to replicate the issue.

      Not sure

       Expected Behavior: Describe what should happen under normal circumstances.

      Memory usage of ovn-controller is low.

       Observed Behavior: Explain what actually happens.

      Memory usage of ovn-controller is extremely high.

       Troubleshooting Actions: Outline the steps taken to diagnose or resolve the issue so far.

      We tried restarting ovn-controller and worker nodes, but its memory usage increase soon and causes OOM Killer again.

       Logs: If you collected logs please provide them (e.g. sos report, /var/log/openvswitch/* , testpmd console)

              ovnteam@redhat.com OVN Team
              rhn-support-yatanaka Yamato Tanaka
              Jianlin Shi Jianlin Shi
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: