Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-44177

[4.17] Kubelet: Change in the available CPUs accounting

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Major Major
    • None
    • 4.18.0
    • Node Tuning Operator
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • Yes
    • None
    • Rejected
    • None
    • In Progress
    • Release Note Not Required
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      NTO CI starts falling with:
       • [FAILED] [247.873 seconds]
      [rfe_id:27363][performance] CPU Management Verification of cpu_manager_state file when kubelet is restart [It] [test_id: 73501] defaultCpuset should not change [tier-0]
      /go/src/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/1_performance/cpu_management.go:309
        [FAILED] Expected
            <cpuset.CPUSet>: {
                elems: {0: {}, 2: {}},
            }
        to equal
            <cpuset.CPUSet>: {
                elems: {0: {}, 1: {}, 2: {}, 3: {}},
            }
        In [It] at: /go/src/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/1_performance/cpu_management.go:332 @ 10/04/24 16:56:51.436 
      
      The failure happened due to the fact that the test pod couldn't get admitted after Kubelet restart.
      
      Adding the failure is happening at this line:
      https://github.com/openshift/kubernetes/blob/cec2232a4be561df0ba32d98f43556f1cad1db01/pkg/kubelet/cm/cpumanager/policy_static.go#L352 
      
      something has changed with how Kubelet accounts for `availablePhysicalCPUs`
      
      

      Version-Release number of selected component (if applicable):

          4.18 (start happening after OCP rebased on top of k8s 1.31

      How reproducible:

          Always

      Steps to Reproduce:

          1. Set up a system with 4 CPUs and apply performance-profile with single-numa-policy
          2. Run pao-functests
          

      Actual results:

          Tests falling with:
       • [FAILED] [247.873 seconds] [rfe_id:27363][performance] CPU Management Verification of cpu_manager_state file when kubelet is restart [It] [test_id: 73501] defaultCpuset should not change [tier-0] /go/src/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/1_performance/cpu_management.go:309 [FAILED] Expected <cpuset.CPUSet>: { elems: {0: {}, 2: {}}, } to equal <cpuset.CPUSet>: { elems: {0: {}, 1: {}, 2: {}, 3: {}}, } In [It] at: /go/src/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/1_performance/cpu_management.go:332 @ 10/04/24 16:56:51.436 

      Expected results:

          Tests should pass

      Additional info:

          NOTE: The issue occurs only on system with small amount of CPUs (4 in our case) 

              titzhak Talor Itzhak
              titzhak Talor Itzhak
              None
              None
              Niranjan Mallapadi Raghavendra Rao Niranjan Mallapadi Raghavendra Rao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: