Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-14958

[release-4.12] Stalld continually restarting

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • 4.12.z
    • Node Tuning Operator
    • None
    • No
    • False
    • Hide

      None

      Show
      None
    • Customer Escalated

      Description of problem:

      Since 4.12.20, we see that tuned reapplies the entirety of its profiles every few minutes. This also leads to constant restarts of stalld:
      
      ~~~
      Jun 13 17:08:44 node-name systemd[1]: stalld.service: Succeeded.
      Jun 13 17:08:44 node-name systemd[1]: stalld.service: Consumed 247ms CPU time
      Jun 13 17:08:44 node-name stalld[1156868]: Disabled RT throttling
      Jun 13 17:08:44 node-name stalld[1156870]: lockdown mode is off
      Jun 13 17:08:44 node-name stalld[1156870]: /sys/kernel/debug/sched/features doesn't exist
      Jun 13 17:08:44 node-name stalld[1156870]: /sys/kernel/debug/sched_features exists
      Jun 13 17:08:44 node-name stalld[1156870]: /sys/kernel/debug/sched/debug doesn't exist
      Jun 13 17:08:44 node-name stalld[1156870]: /proc/sched_debug exists
      Jun 13 17:08:44 node-name stalld[1156870]: boosted pid 0 (undef) using SCHED_DEADLINE
      Jun 13 17:08:44 node-name stalld[1156870]: using SCHED_DEADLINE for boosting
      Jun 13 17:08:44 node-name stalld[1156870]: initial config_buffer_size set to 4464640
      Jun 13 17:08:44 node-name stalld[1156870]: detected new task format
      Jun 13 17:08:44 node-name stalld[1156870]: single threaded mode
      Jun 13 17:08:49 node-name stalld[1156870]: sched_debug is getting larger, increasing the buffer to 8929280
      Jun 13 17:10:19 node-name stalld[1156870]: received signal 15, starting shutdown
      Jun 13 17:10:19 node-name stalld[1171052]: Restored RT throttling
      Jun 13 17:10:19 node-name systemd[1]: stalld.service: Succeeded.
      Jun 13 17:10:19 node-name systemd[1]: stalld.service: Consumed 1.821s CPU time
      Jun 13 17:10:19 node-name stalld[1171058]: Disabled RT throttling
      Jun 13 17:10:19 node-name stalld[1171060]: lockdown mode is off
      Jun 13 17:10:19 node-name stalld[1171060]: /sys/kernel/debug/sched/features doesn't exist
      Jun 13 17:10:19 node-name stalld[1171060]: /sys/kernel/debug/sched_features exists
      Jun 13 17:10:19 node-name stalld[1171060]: /sys/kernel/debug/sched/debug doesn't exist
      Jun 13 17:10:19 node-name stalld[1171060]: /proc/sched_debug exists
      Jun 13 17:10:19 node-name stalld[1171060]: boosted pid 0 (undef) using SCHED_DEADLINE
      Jun 13 17:10:19 node-name stalld[1171060]: using SCHED_DEADLINE for boosting
      Jun 13 17:10:19 node-name stalld[1171060]: initial config_buffer_size set to 4382720
      Jun 13 17:10:19 node-name stalld[1171060]: detected new task format
      Jun 13 17:10:19 node-name stalld[1171060]: single threaded mode
      Jun 13 17:10:24 node-name stalld[1171060]: received signal 15, starting shutdown
      Jun 13 17:10:24 node-name stalld[1172133]: Restored RT throttling
      ~~~
      

      Version-Release number of selected component (if applicable):

      4.12.20
      

      How reproducible:

      
      

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

      
      

      Expected results:

      
      

      Additional info:

      This was already rolled back in 4.14 and 4.13, and it seems that the commit that introduces the issue is not needed and causes more harm than good?
      I created and attached a PR, see below
      

            jmencak Jiri Mencak
            akaris@redhat.com Andreas Karis
            Liquan Cui Liquan Cui
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: