Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-30064

Fails to apply performanceprofile, node stuck on Ready/NotReady, SchedulingDisabled

XMLWordPrintable

    • No
    • CNF Compute Sprint 250, CNF Compute Sprint 251
    • 2
    • False
    • Hide

      None

      Show
      None
    • 2024-03-11: Must gather logs did not show an issue, asked for new logs if possible. Similar issue linked to this bug. Will followup with a live deploy session on the environment if no evidence comes up

      Description of problem:

      Fails to apply performanceprofile, after complete cluster nodes reboot to downgrad cgroup version from v2 to v1, first node stuck on SchedulingDisabled.
      There are two types of failures observerd:
        - first, and most common, relevant mcp stuck on pause and node stuck on Ready,SchedulingDisabled. On first reboot that had to reduce cgroup version from the v2 to v1.
        - second, node rebooted (additional reboot to apply changes), changes aplayed and node is stuck on NotReady,SchedulingDisabled in Updating state forever.

      Version-Release number of selected component (if applicable):

          4.15.0 (GA)

      How reproducible:

          always

      Steps to Reproduce:

          1. deploy disconnected cluster
          2. apply or create performanceprofile config
      
          3.1 wait for the relevant mcp node will change state to the Ready,SchedulingDisabled     
      
          3.2. wait for the complete cluster reboot
          4. wait for another reboot only for the relevant mcp to apply pp config on nodes 
          

      Actual results:

          first case: relevant mcp node stuck on Ready,SchedulingDisabled  with paused mcp; no reboot. 
          second case: node stuck on NotReady,SchedulingDisabled in Updating stage

      Expected results:

          relevant mcp nodes reboot, pp config applied to the nodes, cgroup downgraded to the v1

      Additional info:

          For the first case: manual un-pausing mcp doesn't give any result, node stay stuck on Ready,SchedulingDisabled state, mcp stuck in Updating state forever. All nodes belongs to this mcp didn't change cgroup to the v1.
      
      
      Logs: must gather for the both cases can be found at https://file.emea.redhat.com/~elgerman/OCPBUGS-30064/
      (each log was collected on the clean/new ocp deployment)
      
      

       

              yquinn@redhat.com Yanir Quinn
              elgerman Elena German
              Gowrishankar Rajaiyan Gowrishankar Rajaiyan
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: