Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-32473

With workload partitioning enabled, setting cpu_manager to static and having reserved cpu causes kubelet fail to restart

XMLWordPrintable

    • Critical
    • No
    • CNF Compute Sprint 252, CNF Compute Sprint 253, CNF Compute Sprint 254, CNF Compute Sprint 255
    • 4
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      The flow fixed by this bug was never exposed to users. The change of functionality will be covered by OCPBUGS-32472
      Show
      The flow fixed by this bug was never exposed to users. The change of functionality will be covered by OCPBUGS-32472
    • Release Note Not Required
    • In Progress
    • 2024-06-03: PR needs lgtm, tests passed, QE preverified

      This is a clone of issue OCPBUGS-31348. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-29520. The following is the description of the original issue:

      Description of problem:

       On system with workload partitioning enabled setting cpu_manager to static and having reserved cpu causes kubelet fail to restart

      Version-Release number of selected component (if applicable):

      4.16.0-0.ci-2024-02-13-072746    

      How reproducible:

      Everytime

      Steps to Reproduce:

          1. Enable workload partitioning
          2.  Label one of the worker nodes as worker-test
          3.  Create a mcp for worker-test node.
          4.  Create a kubelet config as show below:
      apiVersion: machineconfiguration.openshift.io/v1
      kind: KubeletConfig
      metadata:
        name: kubelet-test
      spec:
        kubeletConfig:
          cpuManagerPolicy: static
          reservedSystemCPUs: 0,2,12,14
        machineConfigPoolSelector:
          matchLabels:
            machineconfiguration.openshift.io/role: worker-test
      
        5. Node goes to NotReady,SchedulingDisabled mode:
      
          

      Actual results:

      Node goes to NotReady,SchedulingDisabled mode
      [root@cnfdr22 tmp]# oc get nodes
      NAME                                             STATUS                        ROLES                  AGE    VERSION
      ocp-ctlplane-0.libvirt.lab.eng.tlv2.redhat.com   Ready                         control-plane,master   116m   v1.29.1+2f773e8
      ocp-ctlplane-1.libvirt.lab.eng.tlv2.redhat.com   Ready                         control-plane,master   116m   v1.29.1+2f773e8
      ocp-ctlplane-2.libvirt.lab.eng.tlv2.redhat.com   Ready                         control-plane,master   116m   v1.29.1+2f773e8
      ocp-worker-0.libvirt.lab.eng.tlv2.redhat.com     Ready                         worker                 100m   v1.29.1+2f773e8
      ocp-worker-1.libvirt.lab.eng.tlv2.redhat.com     Ready                         worker                 100m   v1.29.1+2f773e8
      ocp-worker-2.libvirt.lab.eng.tlv2.redhat.com     NotReady,SchedulingDisabled   worker,worker-test     100m   v1.29.1+2f773e8
      

      Expected results:

          Node should not go in to NotReady, SchedulingDisabled mode.

      Additional info:

          

              msivak@redhat.com Martin Sivak
              openshift-crt-jira-prow OpenShift Prow Bot
              Min Li Min Li
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: