Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-13321

collect-profiles pods causing regular CPU bursts

    XMLWordPrintable

Details

    • Moderate
    • No
    • Bulbasaur
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      This is a clone of issue OCPBUGS-1684. The following is the description of the original issue:

      Description of problem:

      After an upgrade from 4.9 to 4.10 collect+ process causing  CPU bursts of 5-6 seconds every 15 minutes regularly. During each burst collect+ consume 100% CPU.
      
      Top Command Dump Sample:
      top - 07:00:04 up 10:10,  0 users,  load average: 0.20, 0.24, 0.27
      Tasks: 247 total,   1 running, 246 sleeping,   0 stopped,   0 zombie
      %Cpu(s):  6.3 us,  4.5 sy,  0.0 ni, 80.8 id,  7.4 wa,  0.8 hi,  0.3 si,  0.0 st
      MiB Mem :  32151.9 total,  22601.4 free,   2182.1 used,   7368.4 buff/cache
      MiB Swap:      0.0 total,      0.0 free,      0.0 used.  29420.7 avail Mem     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
         2009 root      20   0 3741252 172136  71396 S  12.9   0.5  36:42.79 kubelet
         1954 root      20   0 2663680 130928  46156 S   7.9   0.4   6:50.44 crio
         9440 root      20   0 1633728 546036  60836 S   7.9   1.7  21:06.80 fluentd
            1 root      20   0  238416  15412   8968 S   5.9   0.0   1:56.73 systemd
         1353 800       10 -10  796808 165380  40916 S   5.0   0.5   2:32.11 ovs-vsw+
         5454 root      20   0 1729112  73680  37404 S   2.0   0.2   3:52.21 coredns
      1061248 1000360+  20   0 1113524  24304  17776 S   2.0   0.1   0:00.03 collect+
          306 root       0 -20       0      0      0 I   1.0   0.0   0:00.37 kworker+
          957 root      20   0  264076 126280 119596 S   1.0   0.4   0:06.80 systemd+
         1114 dbus      20   0   83188   6224   5140 S   1.0   0.0   0:04.30 dbus-da+
         5710 root      20   0  406004  31384  15068 S   1.0   0.1   0:04.11 tuned
         6198 nobody    20   0 1632272  46588  20516 S   1.0   0.1   0:17.60 network+
      1061291 1000650+  20   0   11896   2748   2496 S   1.0   0.0   0:00.01 bash
      1061355 1000650+  20   0   11896   2868   2616 S   1.0   0.0   0:00.01 bashtop - 07:00:05 up 10:10,  0 users,  load average: 0.20, 0.24, 0.27
      Tasks: 248 total,   2 running, 245 sleeping,   0 stopped,   1 zombie
      %Cpu(s): 11.4 us,  2.0 sy,  0.0 ni, 81.5 id,  4.2 wa,  0.6 hi,  0.2 si,  0.0 st
      MiB Mem :  32151.9 total,  22601.4 free,   2182.1 used,   7368.4 buff/cache
      MiB Swap:      0.0 total,      0.0 free,      0.0 used.  29420.7 avail Mem     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
      1061248 1000360+  20   0 1484936  36464  21300 S  74.3   0.1   0:00.78 collect+
         9440 root      20   0 1633728 545412  60900 S  11.9   1.7  21:06.92 fluentd
         2009 root      20   0 3741252 172396  71396 S   4.0   0.5  36:42.83 kubelet
            1 root      20   0  238416  15412   8968 S   1.0   0.0   1:56.74 systemd
          300 root       0 -20       0      0      0 I   1.0   0.0   0:00.46 kworker+
         1427 root      20   0   19656   2204   2064 S   1.0   0.0   0:01.55 agetty
         2419 root      20   0 1714748  38812  22884 S   1.0   0.1   0:24.42 coredns+
         2528 root      20   0 1634680  36464  20628 S   1.0   0.1   0:22.01 dynkeep+
      1009372 root      20   0       0      0      0 I   1.0   0.0   0:00.42 kworker+
      1053353 root      20   0   50200   4012   3292 R   1.0   0.0   0:01.56 toptop - 07:00:06 up 10:10,  0 users,  load average: 0.20, 0.24, 0.27
      Tasks: 247 total,   1 running, 246 sleeping,   0 stopped,   0 zombie
      %Cpu(s): 15.3 us,  1.5 sy,  0.0 ni, 82.7 id,  0.1 wa,  0.2 hi,  0.1 si,  0.0 st
      MiB Mem :  32151.9 total,  22595.9 free,   2185.7 used,   7370.2 buff/cache
      MiB Swap:      0.0 total,      0.0 free,      0.0 used.  29416.7 avail Mem     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
      1061248 1000360+  20   0 1484936  35740  21428 S  99.0   0.1   0:01.78 collect+
         2009 root      20   0 3741252 172396  71396 S   3.0   0.5  36:42.86 kubelet
         9440 root      20   0 1633728 545076  60900 S   2.0   1.7  21:06.94 fluentd
         1353 800       10 -10  796808 165380  40916 S   1.0   0.5   2:32.12 ovs-vsw+
         1954 root      20   0 2663680 131452  46156 S   1.0   0.4   6:50.45 crio top - 07:00:07 up 10:10,  0 users,  load average: 0.20, 0.24, 0.27
      Tasks: 247 total,   1 running, 246 sleeping,   0 stopped,   0 zombie
      %Cpu(s): 14.7 us,  1.1 sy,  0.0 ni, 83.6 id,  0.1 wa,  0.4 hi,  0.1 si,  0.0 st
      MiB Mem :  32151.9 total,  22595.9 free,   2185.7 used,   7370.2 buff/cache
      MiB Swap:      0.0 total,      0.0 free,      0.0 used.  29416.7 avail Mem     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
      1061248 1000360+  20   0 1484936  35236  21492 S 102.0   0.1   0:02.80 collect+
         2009 root      20   0 3741252 172660  71396 S   7.0   0.5  36:42.93 kubelet
         3288 nobody    20   0  718964  30648  11680 S   3.0   0.1   3:36.84 node_ex+
            1 root      20   0  238416  15412   8968 S   1.0   0.0   1:56.75 systemd
         1353 800       10 -10  796808 165380  40916 S   1.0   0.5   2:32.13 ovs-vsw+
         1954 root      20   0 2663680 131452  46156 S   1.0   0.4   6:50.46 crio
         5454 root      20   0 1729112  73680  37404 S   1.0   0.2   3:52.22 coredns
         9440 root      20   0 1633728 545080  60900 S   1.0   1.7  21:06.95 fluentd
      1053353 root      20   0   50200   4012   3292 R   1.0   0.0   0:01.57 toptop - 07:00:08 up 10:10,  0 users,  load average: 0.20, 0.24, 0.27
      Tasks: 247 total,   2 running, 245 sleeping,   0 stopped,   0 zombie
      %Cpu(s): 14.2 us,  0.9 sy,  0.0 ni, 84.5 id,  0.0 wa,  0.2 hi,  0.1 si,  0.0 st
      MiB Mem :  32151.9 total,  22595.9 free,   2185.7 used,   7370.2 buff/cache
      MiB Swap:      0.0 total,      0.0 free,      0.0 used.  29416.7 avail Mem     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
      1061248 1000360+  20   0 1484936  35164  21492 S 100.0   0.1   0:03.81 collect+
         2009 root      20   0 3741252 172660  71396 S   3.0   0.5  36:42.96 kubelet
      1061543 1000650+  20   0   34564   9804   5772 R   3.0   0.0   0:00.03 python
         9440 root      20   0 1633728 543952  60900 S   2.0   1.7  21:06.97 fluentd
      1053353 root      20   0   50200   4012   3292 R   2.0   0.0   0:01.59 top
         2330 root      20   0 1654612  61260  34720 S   1.0   0.2   0:55.81 coredns
         8023 root      20   0   12056   3044   2580 S   1.0   0.0   0:24.59 install+top - 07:00:09 up 10:10,  0 users,  load average: 0.34, 0.27, 0.28
      Tasks: 235 total,   2 running, 233 sleeping,   0 stopped,   0 zombie
      %Cpu(s):  8.9 us,  3.2 sy,  0.0 ni, 85.6 id,  1.5 wa,  0.5 hi,  0.2 si,  0.0 st
      MiB Mem :  32151.9 total,  22621.0 free,   2160.5 used,   7370.4 buff/cache
      MiB Swap:      0.0 total,      0.0 free,      0.0 used.  29441.9 avail Mem     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
         2009 root      20   0 3741252 172660  71396 S   5.0   0.5  36:43.01 kubelet
         9440 root      20   0 1633728 542684  60900 S   4.0   1.6  21:07.01 fluentd
         1353 800       10 -10  796808 165380  40916 S   2.0   0.5   2:32.15 ovs-vsw+
            1 root      20   0  238416  15412   8968 S   1.0   0.0   1:56.76 systemd
         1954 root      20   0 2663680 131452  46156 S   1.0   0.4   6:50.47 crio
         5454 root      20   0 1729112  73680  37404 S   1.0   0.2   3:52.23 coredns
         6198 nobody    20   0 1632272  45936  20516 S   1.0   0.1   0:17.61 network+
         7016 root      20   0   12052   3204   2736 S   1.0   0.0   0:24.19 install+
      

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      Lab environment does not present same behavior.

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

      Regular high CPU spikes

      Expected results:

      No CPU spikes

      Additional info:

      Provided logs:
      1-) top command dump uploaded to SF case 03317387
      2-) must-gather uploaded to SF case 03317387

       

      Attachments

        Issue Links

          Activity

            People

              tshort@redhat.com Todd Short
              openshift-crt-jira-prow OpenShift Prow Bot
              bruno andrade bruno andrade
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: