Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-44180

[4.16] E2E: test related to cpumanager state file check during kubelet restart fails

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • None
    • 4.18.0
    • Node Tuning Operator
    • None
    • Important
    • Yes
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Release Note Not Required
    • In Progress

      This is a clone of issue OCPBUGS-43280. The following is the description of the original issue:

      Description of problem:

      NTO CI starts falling with:
       • [FAILED] [247.873 seconds]
      [rfe_id:27363][performance] CPU Management Verification of cpu_manager_state file when kubelet is restart [It] [test_id: 73501] defaultCpuset should not change [tier-0]
      /go/src/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/1_performance/cpu_management.go:309
        [FAILED] Expected
            <cpuset.CPUSet>: {
                elems: {0: {}, 2: {}},
            }
        to equal
            <cpuset.CPUSet>: {
                elems: {0: {}, 1: {}, 2: {}, 3: {}},
            }
        In [It] at: /go/src/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/1_performance/cpu_management.go:332 @ 10/04/24 16:56:51.436 
      
      The failure happened due to the fact that the test pod couldn't get admitted after Kubelet restart.
      
      Adding the failure is happening at this line:
      https://github.com/openshift/kubernetes/blob/cec2232a4be561df0ba32d98f43556f1cad1db01/pkg/kubelet/cm/cpumanager/policy_static.go#L352 
      
      something has changed with how Kubelet accounts for `availablePhysicalCPUs`
      
      

      Version-Release number of selected component (if applicable):

          4.18 (start happening after OCP rebased on top of k8s 1.31

      How reproducible:

          Always

      Steps to Reproduce:

          1. Set up a system with 4 CPUs and apply performance-profile with single-numa-policy
          2. Run pao-functests
          

      Actual results:

          Tests falling with:
       • [FAILED] [247.873 seconds] [rfe_id:27363][performance] CPU Management Verification of cpu_manager_state file when kubelet is restart [It] [test_id: 73501] defaultCpuset should not change [tier-0] /go/src/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/1_performance/cpu_management.go:309 [FAILED] Expected <cpuset.CPUSet>: { elems: {0: {}, 2: {}}, } to equal <cpuset.CPUSet>: { elems: {0: {}, 1: {}, 2: {}, 3: {}}, } In [It] at: /go/src/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/1_performance/cpu_management.go:332 @ 10/04/24 16:56:51.436 

      Expected results:

          Tests should pass

      Additional info:

          NOTE: The issue occurs only on system with small amount of CPUs (4 in our case) 

              team-nto Team NTO
              openshift-crt-jira-prow OpenShift Prow Bot
              Mallapadi Niranjan Mallapadi Niranjan
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: