Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Major
Fix Version/s: None
Affects Version/s: 4.18.0
Component/s: Node Tuning Operator
Labels:
None

Severity:
Important
Regression:
Yes
Story Points:
1
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Type:
Release Note Not Required
Release Note Status:
In Progress
Target Version:

4.16.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

This is a clone of issue OCPBUGS-43280. The following is the description of the original issue:
—
Description of problem:

NTO CI starts falling with:
 • [FAILED] [247.873 seconds]
[rfe_id:27363][performance] CPU Management Verification of cpu_manager_state file when kubelet is restart [It] [test_id: 73501] defaultCpuset should not change [tier-0]
/go/src/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/1_performance/cpu_management.go:309
  [FAILED] Expected
      <cpuset.CPUSet>: {
          elems: {0: {}, 2: {}},
      }
  to equal
      <cpuset.CPUSet>: {
          elems: {0: {}, 1: {}, 2: {}, 3: {}},
      }
  In [It] at: /go/src/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/1_performance/cpu_management.go:332 @ 10/04/24 16:56:51.436 

The failure happened due to the fact that the test pod couldn't get admitted after Kubelet restart.

Adding the failure is happening at this line:
https://github.com/openshift/kubernetes/blob/cec2232a4be561df0ba32d98f43556f1cad1db01/pkg/kubelet/cm/cpumanager/policy_static.go#L352 

something has changed with how Kubelet accounts for `availablePhysicalCPUs`

Version-Release number of selected component (if applicable):

    4.18 (start happening after OCP rebased on top of k8s 1.31

How reproducible:

    Always

Steps to Reproduce:

    1. Set up a system with 4 CPUs and apply performance-profile with single-numa-policy
    2. Run pao-functests

Actual results:

    Tests falling with:
 • [FAILED] [247.873 seconds] [rfe_id:27363][performance] CPU Management Verification of cpu_manager_state file when kubelet is restart [It] [test_id: 73501] defaultCpuset should not change [tier-0] /go/src/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/1_performance/cpu_management.go:309 [FAILED] Expected <cpuset.CPUSet>: { elems: {0: {}, 2: {}}, } to equal <cpuset.CPUSet>: { elems: {0: {}, 1: {}, 2: {}, 3: {}}, } In [It] at: /go/src/github.com/openshift/cluster-node-tuning-operator/test/e2e/performanceprofile/functests/1_performance/cpu_management.go:332 @ 10/04/24 16:56:51.436

Expected results:

    Tests should pass

Additional info:

    NOTE: The issue occurs only on system with small amount of CPUs (4 in our case)

clones

OCPBUGS-43566 [4.17] E2E: test related to cpumanager state file check during kubelet restart fails

Closed

depends on

OCPBUGS-43566 [4.17] E2E: test related to cpumanager state file check during kubelet restart fails

Closed

links to

openshift/cluster-node-tuning-operator#1204: OCPBUGS-44180: Unblock 4.16 CI

RHBA-2024:8986 OpenShift Container Platform 4.16.z bug fix update

Assignee:: Team NTO

Reporter:: OpenShift Prow Bot

QA Contact:: Mallapadi Niranjan

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/11/04 11:43 AM

Updated:: 2024/11/13 4:22 PM

Resolved:: 2024/11/13 4:22 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates