Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-29520

With workload partitioning enabled, setting cpu_manager to static and having reserved cpu causes kubelet fail to restart

XMLWordPrintable

    • Critical
    • No
    • CNF Compute Sprint 250, CNF Compute Sprint 251
    • 2
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      2024-04-10: Verified.
      2024-03-20: Requires attention by maintainers (but ShiftWeek) to merge
      2024-03-11: Currently blocked on CI, the patch is ready
      Show
      2024-04-10: Verified. 2024-03-20: Requires attention by maintainers (but ShiftWeek) to merge 2024-03-11: Currently blocked on CI, the patch is ready

      Description of problem:

       On system with workload partitioning enabled setting cpu_manager to static and having reserved cpu causes kubelet fail to restart

      Version-Release number of selected component (if applicable):

      4.16.0-0.ci-2024-02-13-072746    

      How reproducible:

      Everytime

      Steps to Reproduce:

          1. Enable workload partitioning
          2.  Label one of the worker nodes as worker-test
          3.  Create a mcp for worker-test node.
          4.  Create a kubelet config as show below:
      apiVersion: machineconfiguration.openshift.io/v1
      kind: KubeletConfig
      metadata:
        name: kubelet-test
      spec:
        kubeletConfig:
          cpuManagerPolicy: static
          reservedSystemCPUs: 0,2,12,14
        machineConfigPoolSelector:
          matchLabels:
            machineconfiguration.openshift.io/role: worker-test
      
        5. Node goes to NotReady,SchedulingDisabled mode:
      
          

      Actual results:

      Node goes to NotReady,SchedulingDisabled mode
      [root@cnfdr22 tmp]# oc get nodes
      NAME                                             STATUS                        ROLES                  AGE    VERSION
      ocp-ctlplane-0.libvirt.lab.eng.tlv2.redhat.com   Ready                         control-plane,master   116m   v1.29.1+2f773e8
      ocp-ctlplane-1.libvirt.lab.eng.tlv2.redhat.com   Ready                         control-plane,master   116m   v1.29.1+2f773e8
      ocp-ctlplane-2.libvirt.lab.eng.tlv2.redhat.com   Ready                         control-plane,master   116m   v1.29.1+2f773e8
      ocp-worker-0.libvirt.lab.eng.tlv2.redhat.com     Ready                         worker                 100m   v1.29.1+2f773e8
      ocp-worker-1.libvirt.lab.eng.tlv2.redhat.com     Ready                         worker                 100m   v1.29.1+2f773e8
      ocp-worker-2.libvirt.lab.eng.tlv2.redhat.com     NotReady,SchedulingDisabled   worker,worker-test     100m   v1.29.1+2f773e8
      

      Expected results:

          Node should not go in to NotReady, SchedulingDisabled mode.

      Additional info:

          

            msivak@redhat.com Martin Sivak
            mniranja Mallapadi Niranjan
            Sunil Choudhary Sunil Choudhary
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated: