Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-31348

With workload partitioning enabled, setting cpu_manager to static and having reserved cpu causes kubelet fail to restart

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Critical Critical
    • None
    • 4.16.0
    • Node / Kubelet
    • Critical
    • No
    • CNF Compute Sprint 251, CNF Compute Sprint 252
    • 2
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      2024-04-10: Merged, will be verified once OCPBUGS-28545 merges.
      2024-03-20: Requires attention by maintainers (but ShiftWeek) to merge
      2024-03-11: Currently blocked on CI, the patch is ready
      Show
      2024-04-10: Merged, will be verified once OCPBUGS-28545 merges. 2024-03-20: Requires attention by maintainers (but ShiftWeek) to merge 2024-03-11: Currently blocked on CI, the patch is ready

      This is a clone of issue OCPBUGS-29520. The following is the description of the original issue:

      Description of problem:

       On system with workload partitioning enabled setting cpu_manager to static and having reserved cpu causes kubelet fail to restart

      Version-Release number of selected component (if applicable):

      4.16.0-0.ci-2024-02-13-072746    

      How reproducible:

      Everytime

      Steps to Reproduce:

          1. Enable workload partitioning
          2.  Label one of the worker nodes as worker-test
          3.  Create a mcp for worker-test node.
          4.  Create a kubelet config as show below:
      apiVersion: machineconfiguration.openshift.io/v1
      kind: KubeletConfig
      metadata:
        name: kubelet-test
      spec:
        kubeletConfig:
          cpuManagerPolicy: static
          reservedSystemCPUs: 0,2,12,14
        machineConfigPoolSelector:
          matchLabels:
            machineconfiguration.openshift.io/role: worker-test
      
        5. Node goes to NotReady,SchedulingDisabled mode:
      
          

      Actual results:

      Node goes to NotReady,SchedulingDisabled mode
      [root@cnfdr22 tmp]# oc get nodes
      NAME                                             STATUS                        ROLES                  AGE    VERSION
      ocp-ctlplane-0.libvirt.lab.eng.tlv2.redhat.com   Ready                         control-plane,master   116m   v1.29.1+2f773e8
      ocp-ctlplane-1.libvirt.lab.eng.tlv2.redhat.com   Ready                         control-plane,master   116m   v1.29.1+2f773e8
      ocp-ctlplane-2.libvirt.lab.eng.tlv2.redhat.com   Ready                         control-plane,master   116m   v1.29.1+2f773e8
      ocp-worker-0.libvirt.lab.eng.tlv2.redhat.com     Ready                         worker                 100m   v1.29.1+2f773e8
      ocp-worker-1.libvirt.lab.eng.tlv2.redhat.com     Ready                         worker                 100m   v1.29.1+2f773e8
      ocp-worker-2.libvirt.lab.eng.tlv2.redhat.com     NotReady,SchedulingDisabled   worker,worker-test     100m   v1.29.1+2f773e8
      

      Expected results:

          Node should not go in to NotReady, SchedulingDisabled mode.

      Additional info:

          

              msivak@redhat.com Martin Sivak
              openshift-crt-jira-prow OpenShift Prow Bot
              Min Li Min Li
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: