[OCPBUGS-32473] With workload partitioning enabled, setting cpu_manager to static and having reserved cpu causes kubelet fail to restart - Red Hat Issue Tracker

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.16.0
Component/s: Node / Kubelet
Labels:
- pre-merge-verify-node
- triaged

Severity:
Critical
Regression:
No
Sprint:
CNF Compute Sprint 252, CNF Compute Sprint 253, CNF Compute Sprint 254, CNF Compute Sprint 255
sprint_count:
4
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:

Hide
The flow fixed by this bug was never exposed to users. The change of functionality will be covered by ~~OCPBUGS-32472~~

Show
The flow fixed by this bug was never exposed to users. The change of functionality will be covered by OCPBUGS-32472
Release Note Type:
Release Note Not Required
Release Note Status:
In Progress
Latest Status Summary:
2024-06-03: PR needs lgtm, tests passed, QE preverified
RH Private Keywords:
Target Version:

4.14.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

This is a clone of issue ~~OCPBUGS-31348~~. The following is the description of the original issue:
—
This is a clone of issue ~~OCPBUGS-29520~~. The following is the description of the original issue:
—
Description of problem:

 On system with workload partitioning enabled setting cpu_manager to static and having reserved cpu causes kubelet fail to restart

Version-Release number of selected component (if applicable):

4.16.0-0.ci-2024-02-13-072746

How reproducible:

Everytime

Steps to Reproduce:

    1. Enable workload partitioning
    2.  Label one of the worker nodes as worker-test
    3.  Create a mcp for worker-test node.
    4.  Create a kubelet config as show below:
apiVersion: machineconfiguration.openshift.io/v1
kind: KubeletConfig
metadata:
  name: kubelet-test
spec:
  kubeletConfig:
    cpuManagerPolicy: static
    reservedSystemCPUs: 0,2,12,14
  machineConfigPoolSelector:
    matchLabels:
      machineconfiguration.openshift.io/role: worker-test

  5. Node goes to NotReady,SchedulingDisabled mode:

Actual results:

Node goes to NotReady,SchedulingDisabled mode
[root@cnfdr22 tmp]# oc get nodes
NAME                                             STATUS                        ROLES                  AGE    VERSION
ocp-ctlplane-0.libvirt.lab.eng.tlv2.redhat.com   Ready                         control-plane,master   116m   v1.29.1+2f773e8
ocp-ctlplane-1.libvirt.lab.eng.tlv2.redhat.com   Ready                         control-plane,master   116m   v1.29.1+2f773e8
ocp-ctlplane-2.libvirt.lab.eng.tlv2.redhat.com   Ready                         control-plane,master   116m   v1.29.1+2f773e8
ocp-worker-0.libvirt.lab.eng.tlv2.redhat.com     Ready                         worker                 100m   v1.29.1+2f773e8
ocp-worker-1.libvirt.lab.eng.tlv2.redhat.com     Ready                         worker                 100m   v1.29.1+2f773e8
ocp-worker-2.libvirt.lab.eng.tlv2.redhat.com     NotReady,SchedulingDisabled   worker,worker-test     100m   v1.29.1+2f773e8

Expected results:

    Node should not go in to NotReady, SchedulingDisabled mode.

Additional info:

clones

OCPBUGS-31348 With workload partitioning enabled, setting cpu_manager to static and having reserved cpu causes kubelet fail to restart

Closed

is blocked by

OCPBUGS-31348 With workload partitioning enabled, setting cpu_manager to static and having reserved cpu causes kubelet fail to restart

Closed

links to

openshift/kubernetes#1951: [release-4.14] UPSTREAM: <carry>: OCPBUGS-32473: fix cpu manager cpuset check

RHBA-2024:3881 OpenShift Container Platform 4.14.z bug fix update

Assignee:: Martin Sivak

Reporter:: OpenShift Prow Bot

QA Contact:: Min Li

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2024/04/19 8:06 AM

Updated:: 2024/06/19 2:37 PM

Resolved:: 2024/06/19 2:37 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide