- 
    Bug 
- 
    Resolution: Unresolved
- 
    Undefined 
- 
    None
- 
    4.18.z
- 
    None
- 
        Quality / Stability / Reliability
- 
        False
- 
        
- 
        None
- 
        Important
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
- 
        None
Description of problem:
During a cluster SNO CGU upgrade, the worker PerformanceProfile was in a Degraded state with the error:
$ omc get performanceprofile master-profile -o yaml ... machineConfigPoolSelector: pools.operator.machineconfiguration.openshift.io/master: "" nodeSelector: node-role.kubernetes.io/master: "" ... lastHeartbeatTime: "2025-10-15T09:38:12Z" lastTransitionTime: "2025-10-15T09:38:12Z" message: the MachineConfigPool "master" does not have any labels that can be used to bind it together with KubeletConfig reason: BadMachineConfigLabels status: "True" type: Degraded ... $ omc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-29ea7f8a941960dfc728072d933ae532 True False False 1 1 1 0 5h worker rendered-worker-11c0c71fc1056d589ef44207f9515356 True False False 0 0 0 0 5h
Based on the code https://github.com/openshift/cluster-node-tuning-operator/blob/dbb384039d22b64a080cb114df5cde7be1effb42/pkg/performanceprofile/controller/performanceprofile_controller.go#L581
the error occurs when len(profileMCP.Labels) == 0, meaning the MachineConfigPool has no labels in metadata.labels. However, this MCP actually has labels, so the configuration should reconcile
$ omc get mcp --show-labels NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE LABELS master rendered-master-29ea7f8a941960dfc728072d933ae532 True False False 1 1 1 0 5h machineconfiguration.openshift.io/mco-built-in=,operator.machineconfiguration.openshift.io/required-for-upgrade=,pools.operator.machineconfiguration.openshift.io/master= worker rendered-worker-11c0c71fc1056d589ef44207f9515356 True False False 0 0 0 0 5h machineconfiguration.openshift.io/mco-built-in=,pools.operator.machineconfiguration.openshift.io/worker=
Operator logs:
$ omc logs cluster-node-tuning-operator-759544dd89-zq975 -n openshift-cluster-node-tuning-operator ... 2025-10-15T09:32:58.647515086Z I1015 09:32:58.647486 1 leaderelection.go:254] attempting to acquire leader lease openshift-cluster-node-tuning-operator/node-tuning-operator-lock... 2025-10-15T09:38:12.288248812Z I1015 09:38:12.288212 1 leaderelection.go:268] successfully acquired lease openshift-cluster-node-tuning-operator/node-tuning-operator-lock 2025-10-15T09:38:12.288429065Z I1015 09:38:12.288380 1 controller.go:1322] starting Tuned controller 2025-10-15T09:38:12.288558723Z {"level":"info","ts":"2025-10-15T09:38:12Z","msg":"Starting EventSource","controller":"performanceprofile","controllerGroup":"performance.openshift.io","controllerKind":"PerformanceProfile","source":"kind source: *v2.PerformanceProfile"} 2025-10-15T09:38:12.288577121Z {"level":"info","ts":"2025-10-15T09:38:12Z","msg":"Starting EventSource","controller":"performanceprofile","controllerGroup":"performance.openshift.io","controllerKind":"PerformanceProfile","source":"kind source: *v1.MachineConfig"} 2025-10-15T09:38:12.288583643Z {"level":"info","ts":"2025-10-15T09:38:12Z","msg":"Starting EventSource","controller":"performanceprofile","controllerGroup":"performance.openshift.io","controllerKind":"PerformanceProfile","source":"kind source: *v1.KubeletConfig"} 2025-10-15T09:38:12.288590245Z {"level":"info","ts":"2025-10-15T09:38:12Z","msg":"Starting EventSource","controller":"performanceprofile","controllerGroup":"performance.openshift.io","controllerKind":"PerformanceProfile","source":"kind source: *v1.Tuned"} 2025-10-15T09:38:12.288590245Z {"level":"info","ts":"2025-10-15T09:38:12Z","msg":"Starting EventSource","controller":"performanceprofile","controllerGroup":"performance.openshift.io","controllerKind":"PerformanceProfile","source":"kind source: *v1.RuntimeClass"} 2025-10-15T09:38:12.288597254Z {"level":"info","ts":"2025-10-15T09:38:12Z","msg":"Starting EventSource","controller":"performanceprofile","controllerGroup":"performance.openshift.io","controllerKind":"PerformanceProfile","source":"kind source: *v1.MachineConfigPool"} 2025-10-15T09:38:12.288603532Z {"level":"info","ts":"2025-10-15T09:38:12Z","msg":"Starting EventSource","controller":"performanceprofile","controllerGroup":"performance.openshift.io","controllerKind":"PerformanceProfile","source":"kind source: *v1.Profile"} 2025-10-15T09:38:12.288603532Z {"level":"info","ts":"2025-10-15T09:38:12Z","msg":"Starting Controller","controller":"performanceprofile","controllerGroup":"performance.openshift.io","controllerKind":"PerformanceProfile"} 2025-10-15T09:38:12.488898347Z I1015 09:38:12.488869 1 controller.go:1443] started events processor/controller 2025-10-15T09:38:12.496100626Z I1015 09:38:12.496038 1 server.go:104] starting metrics server 2025-10-15T09:38:12.613915258Z {"level":"info","ts":"2025-10-15T09:38:12Z","msg":"Starting workers","controller":"performanceprofile","controllerGroup":"performance.openshift.io","controllerKind":"PerformanceProfile","worker count":1} ...
Workaround applied: restart the NTO operator pod
$ oc delete pod -n openshift-cluster-node-tuning-operator cluster-node-tuning-operator-56678557f-lmvzl
Version-Release number of selected component:
$ omc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.18.22 True False 2h50m Cluster version is 4.18.22 $ omc get nodes NAME STATUS ROLES AGE VERSION master0 Ready control-plane,master,worker 5h v1.31.11
How reproducible:
The partner is trying to reproduce the issue but cannot right now and is checking whether the labels are being really removed. In my lab, I can reproduce the same behavior by deleting the MCP labels; the NTO reports "BadMachineConfigLabels" and after re-adding the labels, the NTO does not update its status
Steps to Reproduce:
- Scale down the machine-config controller/operator deployments.
- Remove MCP labels.
- Delete the NTO operator pod.
- Check the PerformanceProfile status.
Actual results:
The PerformanceProfile shows the error "BadMachineConfigLabels"
Expected results:
The PerformanceProfile should reconcile the configuration without needing to restart the NTO operator pod.
Additional info:
Reviewing the code, it reconciles when the MCP Status.Conditions change, but during label changes this does not change, and the NTO does not reconcile the configuration
https://github.com/openshift/cluster-node-tuning-operator/blob/dbb384039d22b64a080cb114df5cde7be1effb42/pkg/performanceprofile/controller/performanceprofile_controller.go#L129
mcpPredicates := predicate.Funcs{
    UpdateFunc: func(e event.UpdateEvent) bool {
        if !validateUpdateEvent(e.ObjectOld, e.ObjectNew) {
            return false
        }
        mcpOld := e.ObjectOld.(*mcov1.MachineConfigPool)
        mcpNew := e.ObjectNew.(*mcov1.MachineConfigPool)
        return !reflect.DeepEqual(mcpOld.Status.Conditions, mcpNew.Status.Conditions)
    },
}
Logs attached here: https://drive.google.com/drive/folders/187rqVwVBwFCowrPlpbyRD6Z0yiXNHRJZ?usp=drive_link