[OCPBUGS-53153] MCO requires reboot on worker nodes when creating new MC (even for MCP with zero machine count)

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.18.z
Component/s: Node Tuning Operator
Labels:
- mco-triaged

Severity:
Moderate
Regression:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

 when creating a new MCP with MCselector specified by matchExpressions, it seems required that the value `worker` be part of the valid values of the label "machineconfiguration.openshift.io/role". When that is created even with no nodes with the specified label in nodeSelector, this creates a new MC and it triggers reboot on the worker nodes.

Version-Release number of selected component (if applicable):

seen 4.18, but other versions were not tested

How reproducible:

    always on a fresh OCP. reapplying the reproduction steps with completely new MC and MCP doesn't trigger the reboot after the first time.

Steps to Reproduce:

    1.on a vanilla fresh OCP cluster that has worker pool create an mcp like below that no node is found to belong to it && no machine config (no node with node label as below) - so completly new mcp:
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
  name: my-test
  labels:
    machineconfiguration.openshift.io/role: my-test
spec:
  machineConfigSelector:
    matchExpressions:
      - {
           key: machineconfiguration.openshift.io/role,
           operator: In,
           values: [worker,my-test],
        }
  nodeSelector:
    matchLabels:
      node-role.kubernetes.io/my-test: ""

     2.This creates an empty mcp, and a new MC but also triggers a reboot on the worker nodes.

Actual results:

    worker nodes are rebooted and rendered MC of the worker MCP is changed

Expected results:

the mcp has 0 machine count so it is anticipaed to have no reboots on any node + no change in the rendered MC of MCP worker.

Additional info:

deleting the added MCP triggers again reboot on the worker nodes
And doesn't remove the corresponding MC, so if the MCP with the same MCselector is created again no reboot would happen on any node. Not sure if this is expected to keep an old MC.

causes

OCPBUGS-52958 multiple NROP node groups trigger unneeded nodes reboot on the not affected node groups

Shereen Haj added a comment - 2025/03/20 10:41 AM

djoshy Shouldn't it be MCO warning/error that a node matches multiple MCPs?

Shereen Haj added a comment - 2025/03/20 10:41 AM djoshy Shouldn't it be MCO warning/error that a node matches multiple MCPs?

Jiri Mencak added a comment - 2025/03/19 7:45 PM

Sounds like misconfiguration to me:

profile cnfdr9.telco5g.eng.rdu2.redhat.com uses machineConfigLabels that match across multiple MCPs (my-test,worker,worker-cnf); this is not supported

Jiri Mencak added a comment - 2025/03/19 7:45 PM Sounds like misconfiguration to me: profile cnfdr9.telco5g.eng.rdu2.redhat.com uses machineConfigLabels that match across multiple MCPs (my-test,worker,worker-cnf); this is not supported

Yu Qi Zhang added a comment - 2025/03/18 12:10 AM

Leaving undefined priority until we can access the conditions and impact.

Yu Qi Zhang added a comment - 2025/03/18 12:10 AM Leaving undefined priority until we can access the conditions and impact.

Shereen Haj added a comment - 2025/03/17 7:43 AM

MCs and MCP yamls and must-gather can be found at:
https://drive.google.com/drive/folders/1JD8VHZlJO95-nLz1Miox409C6_Nukj4u?usp=drive_link

Shereen Haj added a comment - 2025/03/17 7:43 AM MCs and MCP yamls and must-gather can be found at: https://drive.google.com/drive/folders/1JD8VHZlJO95-nLz1Miox409C6_Nukj4u?usp=drive_link

Assignee:: Team NTO

Reporter:: Shereen Haj

QA Contact:: Sergio Regidor de la Rosa

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2025/03/16 12:02 AM

Updated:: 2025/03/20 1:39 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: Shereen Haj added a comment - 2025/03/20 10:41 AM

Expand comment: Shereen Haj added a comment - 2025/03/20 10:41 AM

Collapse comment: Jiri Mencak added a comment - 2025/03/19 7:45 PM

Expand comment: Jiri Mencak added a comment - 2025/03/19 7:45 PM

Collapse comment: Yu Qi Zhang added a comment - 2025/03/18 12:10 AM

Expand comment: Yu Qi Zhang added a comment - 2025/03/18 12:10 AM

Collapse comment: Shereen Haj added a comment - 2025/03/17 7:43 AM

Expand comment: Shereen Haj added a comment - 2025/03/17 7:43 AM

People

Dates