Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-12714

Prometheus, promtail, node exporter consuming all CPU on a system

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • 4.14.0
    • 4.13.0
    • Monitoring
    • None
    • +
    • Important
    • No
    • MON Sprint 238
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Before this update, large amounts of CPU resources might be consumed during metrics scraping as a result of the way node-exporter collected network interface information. This release fixes this issue by improving the performance of node-exporter when collecting network interface information, thereby resolving the issue with excessive CPU usage during metrics scraping. link:https://issues.redhat.com/browse/OCPBUGS-12714[OCPBUGS-12714]
      Show
      * Before this update, large amounts of CPU resources might be consumed during metrics scraping as a result of the way node-exporter collected network interface information. This release fixes this issue by improving the performance of node-exporter when collecting network interface information, thereby resolving the issue with excessive CPU usage during metrics scraping. link: https://issues.redhat.com/browse/OCPBUGS-12714 [ OCPBUGS-12714 ]
    • Bug Fix
    • Done

      Description of problem:

      Under heavy control plane load (bringing up ~200 pods), prometheus/promtail spikes to over 100% CPU, node_exporter goes to ~200% cpu and stays there for 5-10 minutes. Tested on a GCP cluster bot using 2 physical core (4 vcpu) workers. This starves out essential platform functions like OVS from getting any CPU and causes the data plane to go down.
      
      Running perf against node_exporter reveals the application is consuming the majority of its CPU trying to list new interfaces being added in sysfs. This looks like it is due to disbling netlink via:
      
      https://issues.redhat.com/browse/OCPBUGS-8282
      
      This operation grabs the rtnl lock which can compete with other components on the host that are trying to configure networking.

      Version-Release number of selected component (if applicable):

      Tested on 4.13 and 4.14 with GCP.

      How reproducible:

      3/4 times

      Steps to Reproduce:

      1. Launch gcp with cluster bot
      2. Create a deployment with pause containers which will max out pods on the nodes:
      
      ---
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: webserver-deployment
        namespace: openshift-ovn-kubernetes
        labels:
          pod-name: server
          app: nginx
          role: webserver
      spec:
        replicas: 700
        selector:
          matchLabels:
            app: nginx
        template:
          metadata:
            labels:
              app: nginx
              role: webserver
          spec:
            containers:
              - name: webserver1
                image: k8s.gcr.io/pause:3.1
                ports:
                  - containerPort: 80
                    name: serve-80
                    protocol: TCP 
      3. Watch top cpu output. Wait for node_exporter and prometheus to show very high CPU. If this does not happen, proceed to step 4.
      4. Delete the deployment and then recreate it.
      5. High and persistent CPU usage should now be observed.

      Actual results:

      CPU is pegged on the host for several minutes. Terminal is almost unresponsive. Only way to fix it was to delete node_exporter and prometheus DS.

      Expected results:

      Prometheus and other metrics related applications should:
      1. use netlink to avoid grabbing rtnl lock
      2. should be cpu limited. Certain required applications in OCP are resource unbounded (like networking data plane) to ensure the node's core functions continue to work. Metrics however should be CPU limited to avoid tooling from locking up a node.

      Additional info:

      Perf summary (will attach full perf output)
          99.94%     0.00%  node_exporter  node_exporter      [.] runtime.goexit.abi0
                  |
                  ---runtime.goexit.abi0
                     |
                      --99.33%--github.com/prometheus/node_exporter/collector.NodeCollector.Collect.func2
                                |
                                 --99.33%--github.com/prometheus/node_exporter/collector.NodeCollector.Collect.func1
                                           |
                                            --99.33%--github.com/prometheus/node_exporter/collector.execute
                                                      |
                                                      |--97.67%--github.com/prometheus/node_exporter/collector.(*netClassCollector).Update
                                                      |          |
                                                      |           --97.67%--github.com/prometheus/node_exporter/collector.(*netClassCollector).netClassSysfsUpdate
                                                      |                     |
                                                      |                      --97.67%--github.com/prometheus/node_exporter/collector.(*netClassCollector).getNetClassInfo
                                                      |                                |
                                                      |                                 --97.64%--github.com/prometheus/procfs/sysfs.FS.NetClassByIface
                                                      |                                           |
                                                      |                                            --97.64%--github.com/prometheus/procfs/sysfs.parseNetClassIface
                                                      |                                                      |
                                                      |                                                       --97.61%--github.com/prometheus/procfs/internal/util.SysReadFile
                                                      |                                                                 |
                                                      |                                                                  --97.45%--syscall.read
                                                      |                                                                            |
                                                      |                                                                             --97.45%--syscall.Syscall
                                                      |                                                                                       |
                                                      |                                                                                        --97.45%--runtime/internal/syscall.Syscall6
                                                      |                                                                                                  |
                                                      |                                                                                                   --70.34%--entry_SYSCALL_64_after_hwframe
                                                      |                                                                                                             do_syscall_64
                                                      |                                                                                                             |
                                                      |                                                                                                             |--39.13%--ksys_read
                                                      |                                                                                                             |          |
                                                      |                                                                                                             |          |--31.97%--vfs_read

              rhn-support-bburt Brian Burt
              trozet@redhat.com Tim Rozet
              Tai Gao Tai Gao
              Brian Burt Brian Burt
              Brian Burt
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated:
                Resolved: