Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-1684

collect-profiles pods causing regular CPU bursts


    • Moderate
    • None
    • Anarchy 235, Bulbasaur
    • 2
    • Rejected
    • False
    • Hide


    • Hide
      * Before this update, `collect-profiles` pods caused regular spikes of CPU usage due to the way certificates were generated. With this update, certificates are generated daily, the loading of the certificate is optimized, and CPU usage is lower.
      * Before this update, `collect-profiles` pods caused regular spikes of CPU usage due to the way certificates were generated. With this update, certificates are generated daily, the loading of the certificate is optimized, and CPU usage is lower. (link: https://issues.redhat.com/browse/OCPBUGS-1684 [* OCPBUGS-1684 *])
    • Bug Fix
    • Done

      Description of problem:

      After an upgrade from 4.9 to 4.10 collect+ process causing  CPU bursts of 5-6 seconds every 15 minutes regularly. During each burst collect+ consume 100% CPU.
      Top Command Dump Sample:
      top - 07:00:04 up 10:10,  0 users,  load average: 0.20, 0.24, 0.27
      Tasks: 247 total,   1 running, 246 sleeping,   0 stopped,   0 zombie
      %Cpu(s):  6.3 us,  4.5 sy,  0.0 ni, 80.8 id,  7.4 wa,  0.8 hi,  0.3 si,  0.0 st
      MiB Mem :  32151.9 total,  22601.4 free,   2182.1 used,   7368.4 buff/cache
      MiB Swap:      0.0 total,      0.0 free,      0.0 used.  29420.7 avail Mem     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
         2009 root      20   0 3741252 172136  71396 S  12.9   0.5  36:42.79 kubelet
         1954 root      20   0 2663680 130928  46156 S   7.9   0.4   6:50.44 crio
         9440 root      20   0 1633728 546036  60836 S   7.9   1.7  21:06.80 fluentd
            1 root      20   0  238416  15412   8968 S   5.9   0.0   1:56.73 systemd
         1353 800       10 -10  796808 165380  40916 S   5.0   0.5   2:32.11 ovs-vsw+
         5454 root      20   0 1729112  73680  37404 S   2.0   0.2   3:52.21 coredns
      1061248 1000360+  20   0 1113524  24304  17776 S   2.0   0.1   0:00.03 collect+
          306 root       0 -20       0      0      0 I   1.0   0.0   0:00.37 kworker+
          957 root      20   0  264076 126280 119596 S   1.0   0.4   0:06.80 systemd+
         1114 dbus      20   0   83188   6224   5140 S   1.0   0.0   0:04.30 dbus-da+
         5710 root      20   0  406004  31384  15068 S   1.0   0.1   0:04.11 tuned
         6198 nobody    20   0 1632272  46588  20516 S   1.0   0.1   0:17.60 network+
      1061291 1000650+  20   0   11896   2748   2496 S   1.0   0.0   0:00.01 bash
      1061355 1000650+  20   0   11896   2868   2616 S   1.0   0.0   0:00.01 bashtop - 07:00:05 up 10:10,  0 users,  load average: 0.20, 0.24, 0.27
      Tasks: 248 total,   2 running, 245 sleeping,   0 stopped,   1 zombie
      %Cpu(s): 11.4 us,  2.0 sy,  0.0 ni, 81.5 id,  4.2 wa,  0.6 hi,  0.2 si,  0.0 st
      MiB Mem :  32151.9 total,  22601.4 free,   2182.1 used,   7368.4 buff/cache
      MiB Swap:      0.0 total,      0.0 free,      0.0 used.  29420.7 avail Mem     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
      1061248 1000360+  20   0 1484936  36464  21300 S  74.3   0.1   0:00.78 collect+
         9440 root      20   0 1633728 545412  60900 S  11.9   1.7  21:06.92 fluentd
         2009 root      20   0 3741252 172396  71396 S   4.0   0.5  36:42.83 kubelet
            1 root      20   0  238416  15412   8968 S   1.0   0.0   1:56.74 systemd
          300 root       0 -20       0      0      0 I   1.0   0.0   0:00.46 kworker+
         1427 root      20   0   19656   2204   2064 S   1.0   0.0   0:01.55 agetty
         2419 root      20   0 1714748  38812  22884 S   1.0   0.1   0:24.42 coredns+
         2528 root      20   0 1634680  36464  20628 S   1.0   0.1   0:22.01 dynkeep+
      1009372 root      20   0       0      0      0 I   1.0   0.0   0:00.42 kworker+
      1053353 root      20   0   50200   4012   3292 R   1.0   0.0   0:01.56 toptop - 07:00:06 up 10:10,  0 users,  load average: 0.20, 0.24, 0.27
      Tasks: 247 total,   1 running, 246 sleeping,   0 stopped,   0 zombie
      %Cpu(s): 15.3 us,  1.5 sy,  0.0 ni, 82.7 id,  0.1 wa,  0.2 hi,  0.1 si,  0.0 st
      MiB Mem :  32151.9 total,  22595.9 free,   2185.7 used,   7370.2 buff/cache
      MiB Swap:      0.0 total,      0.0 free,      0.0 used.  29416.7 avail Mem     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
      1061248 1000360+  20   0 1484936  35740  21428 S  99.0   0.1   0:01.78 collect+
         2009 root      20   0 3741252 172396  71396 S   3.0   0.5  36:42.86 kubelet
         9440 root      20   0 1633728 545076  60900 S   2.0   1.7  21:06.94 fluentd
         1353 800       10 -10  796808 165380  40916 S   1.0   0.5   2:32.12 ovs-vsw+
         1954 root      20   0 2663680 131452  46156 S   1.0   0.4   6:50.45 crio top - 07:00:07 up 10:10,  0 users,  load average: 0.20, 0.24, 0.27
      Tasks: 247 total,   1 running, 246 sleeping,   0 stopped,   0 zombie
      %Cpu(s): 14.7 us,  1.1 sy,  0.0 ni, 83.6 id,  0.1 wa,  0.4 hi,  0.1 si,  0.0 st
      MiB Mem :  32151.9 total,  22595.9 free,   2185.7 used,   7370.2 buff/cache
      MiB Swap:      0.0 total,      0.0 free,      0.0 used.  29416.7 avail Mem     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
      1061248 1000360+  20   0 1484936  35236  21492 S 102.0   0.1   0:02.80 collect+
         2009 root      20   0 3741252 172660  71396 S   7.0   0.5  36:42.93 kubelet
         3288 nobody    20   0  718964  30648  11680 S   3.0   0.1   3:36.84 node_ex+
            1 root      20   0  238416  15412   8968 S   1.0   0.0   1:56.75 systemd
         1353 800       10 -10  796808 165380  40916 S   1.0   0.5   2:32.13 ovs-vsw+
         1954 root      20   0 2663680 131452  46156 S   1.0   0.4   6:50.46 crio
         5454 root      20   0 1729112  73680  37404 S   1.0   0.2   3:52.22 coredns
         9440 root      20   0 1633728 545080  60900 S   1.0   1.7  21:06.95 fluentd
      1053353 root      20   0   50200   4012   3292 R   1.0   0.0   0:01.57 toptop - 07:00:08 up 10:10,  0 users,  load average: 0.20, 0.24, 0.27
      Tasks: 247 total,   2 running, 245 sleeping,   0 stopped,   0 zombie
      %Cpu(s): 14.2 us,  0.9 sy,  0.0 ni, 84.5 id,  0.0 wa,  0.2 hi,  0.1 si,  0.0 st
      MiB Mem :  32151.9 total,  22595.9 free,   2185.7 used,   7370.2 buff/cache
      MiB Swap:      0.0 total,      0.0 free,      0.0 used.  29416.7 avail Mem     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
      1061248 1000360+  20   0 1484936  35164  21492 S 100.0   0.1   0:03.81 collect+
         2009 root      20   0 3741252 172660  71396 S   3.0   0.5  36:42.96 kubelet
      1061543 1000650+  20   0   34564   9804   5772 R   3.0   0.0   0:00.03 python
         9440 root      20   0 1633728 543952  60900 S   2.0   1.7  21:06.97 fluentd
      1053353 root      20   0   50200   4012   3292 R   2.0   0.0   0:01.59 top
         2330 root      20   0 1654612  61260  34720 S   1.0   0.2   0:55.81 coredns
         8023 root      20   0   12056   3044   2580 S   1.0   0.0   0:24.59 install+top - 07:00:09 up 10:10,  0 users,  load average: 0.34, 0.27, 0.28
      Tasks: 235 total,   2 running, 233 sleeping,   0 stopped,   0 zombie
      %Cpu(s):  8.9 us,  3.2 sy,  0.0 ni, 85.6 id,  1.5 wa,  0.5 hi,  0.2 si,  0.0 st
      MiB Mem :  32151.9 total,  22621.0 free,   2160.5 used,   7370.4 buff/cache
      MiB Swap:      0.0 total,      0.0 free,      0.0 used.  29441.9 avail Mem     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
         2009 root      20   0 3741252 172660  71396 S   5.0   0.5  36:43.01 kubelet
         9440 root      20   0 1633728 542684  60900 S   4.0   1.6  21:07.01 fluentd
         1353 800       10 -10  796808 165380  40916 S   2.0   0.5   2:32.15 ovs-vsw+
            1 root      20   0  238416  15412   8968 S   1.0   0.0   1:56.76 systemd
         1954 root      20   0 2663680 131452  46156 S   1.0   0.4   6:50.47 crio
         5454 root      20   0 1729112  73680  37404 S   1.0   0.2   3:52.23 coredns
         6198 nobody    20   0 1632272  45936  20516 S   1.0   0.1   0:17.61 network+
         7016 root      20   0   12052   3204   2736 S   1.0   0.0   0:24.19 install+

      Version-Release number of selected component (if applicable):


      How reproducible:

      Lab environment does not present same behavior.

      Steps to Reproduce:


      Actual results:

      Regular high CPU spikes

      Expected results:

      No CPU spikes

      Additional info:

      Provided logs:
      1-) top command dump uploaded to SF case 03317387
      2-) must-gather uploaded to SF case 03317387


            tshort@redhat.com Todd Short
            rh-ee-kyildiri Kursad Yildirim
            bruno andrade bruno andrade
            Michael Peter Michael Peter
            0 Vote for this issue
            9 Start watching this issue
