Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-11384

Switching from enabling realTime to disabling Realtime Workloadhint causes stalld to be enabled

XMLWordPrintable

    • Moderate
    • No
    • CNF Compute Sprint 235
    • 1
    • False
    • Hide

      None

      Show
      None

      This is a clone of issue OCPBUGS-10635. The following is the description of the original issue:

      Description of problem:

      When Performance profile is modified from realTime true to realTime false , the change in workloadhints doesnt stop stalld. 

      Version-Release number of selected component (if applicable):

      4.13.0-0.nightly-2023-03-14-053612

      How reproducible:

      everytime

      Steps to Reproduce:

      1. Setup multinode BM cluster with 2 workers
      2. Create a mcp pool worker-cnf
      3. Create a profile as show below:

       

      spec:
        cpu:
          balanceIsolated: false
          isolated: 2-39,42-79
          reserved: 0-1,40-41
        machineConfigPoolSelector:
          machineconfiguration.openshift.io/role: worker-cnf
        nodeSelector:
          node-role.kubernetes.io/worker-cnf: ""
        numa:
          topologyPolicy: single-numa-node
        realTimeKernel:
          enabled: true
        workloadHints:
          realTime: true   

       

      4.  Wait for nodes to comeback .
      5. Check stalld process is running 
      6. Then modify the profile to disable the realTime workload hint to false. as show below:

      spec:
        cpu:
          balanceIsolated: false
          isolated: 2-39,42-79
          reserved: 0-1,40-41
        machineConfigPoolSelector:
          machineconfiguration.openshift.io/role: worker-cnf
        nodeSelector:
          node-role.kubernetes.io/worker-cnf: ""
        numa:
          topologyPolicy: single-numa-node
        realTimeKernel:
          enabled: false
        workloadHints:
          realTime: false   

      7. check the nodes are in ready state. 

      [root@registry kni]# oc get nodes
      NAME       STATUS   ROLES                  AGE    VERSION
      master-0   Ready    control-plane,master   5d6h   v1.26.2+bc894ae
      master-1   Ready    control-plane,master   5d6h   v1.26.2+bc894ae
      master-2   Ready    control-plane,master   5d6h   v1.26.2+bc894ae
      worker-0   Ready    worker,worker-cnf      5d5h   v1.26.2+bc894ae
      worker-1   Ready    worker                 5d5h   v1.26.2+bc894ae

      Actual results:

      [root@registry kni]# oc debug node/worker-0
      Temporary namespace openshift-debug-k9knn is created for debugging node...
      Starting pod/worker-0-debug ...
      To use host binaries, run `chroot /host`
      Pod IP: 10.46.80.2
      If you don't see a command prompt, try pressing enter.
      sh-4.4# chroot /host
      sh-5.1# systemctl status stalld
      ● stalld.service - Stall Monitor
           Loaded: loaded (/usr/lib/systemd/system/stalld.service; enabled; preset: disabled)
           Active: active (running) since Tue 2023-03-21 16:36:13 UTC; 12min ago
         Main PID: 1785 (stalld)
            Tasks: 1 (limit: 3299464)
           Memory: 720.0K
              CPU: 66ms
           CGroup: /system.slice/stalld.service
                   └─1785 /usr/bin/stalld --systemd -p 1000000000 -r 20000 -d 3 -t 20 --foreground --pidfile /run/stalld.pidMar 21 16:36:13 localhost stalld[1785]: lockdown mode is off
      Mar 21 16:36:13 localhost stalld[1785]: /sys/kernel/debug/sched/features exists
      Mar 21 16:36:13 localhost stalld[1785]: dl_runtime is shorter than 1ms, setting HRTICK_DL
      Mar 21 16:36:13 localhost stalld[1785]: /sys/kernel/debug/sched/debug exists
      Mar 21 16:36:13 localhost stalld[1785]: boosted pid 0 (undef) using SCHED_DEADLINE
      Mar 21 16:36:13 localhost systemd[1]: Started Stall Monitor.
      Mar 21 16:36:13 localhost stalld[1785]: using SCHED_DEADLINE for boosting
      Mar 21 16:36:13 localhost stalld[1785]: initial config_buffer_size set to 614400
      Mar 21 16:36:13 localhost stalld[1785]: detected new task format
      Mar 21 16:36:13 localhost stalld[1785]: single threaded mode
      sh-5.1# ps -ef | grep stalld
      root        1785       1  0 16:36 ?        00:00:00 /usr/bin/stalld --systemd -p 1000000000 -r 20000 -d 3 -t 20 --foreground --pidfile /run/stalld.pid
      root       13676   13420  0 16:49 ?        00:00:00 grep stalld
      

      Expected results:

      stalld should be disabled

      Additional info:

       

            msivak@redhat.com Martin Sivak
            openshift-crt-jira-prow OpenShift Prow Bot
            Shereen Haj Shereen Haj
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: