[OCPBUGS-5452] Taking much time to update node count for MCP - Red Hat Issue Tracker

Type: Bug
Resolution: Done-Errata
Priority: Normal
Fix Version/s: 4.16.0
Affects Version/s: 4.11
Component/s: Machine Config Operator
Labels:

Regression:
None
Epic Link:
Machine Config Node
Sprint:
MCO Sprint 247, MCO Sprint 248
sprint_count:
2
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:

Hide
* Previously, when a node was removed from a `MachineConfigPool`, the Machine Config Operator (MCO) did not report an error or the removal of the node. The MCO does not support managing nodes when they are not in a pool and there was no indication that node management ceased after the node was removed. With this release, if a node is removed from all pools, the MCO now logs an error. (link:https://issues.redhat.com/browse/OCPBUGS-5452[*~~OCPBUGS-5452~~*])

Show
* Previously, when a node was removed from a `MachineConfigPool`, the Machine Config Operator (MCO) did not report an error or the removal of the node. The MCO does not support managing nodes when they are not in a pool and there was no indication that node management ceased after the node was removed. With this release, if a node is removed from all pools, the MCO now logs an error. (link: https://issues.redhat.com/browse/OCPBUGS-5452 [* OCPBUGS-5452 *])
Release Note Type:
Bug Fix
Release Note Status:
Done
Target Version:

4.16.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

MCO taking too much time to update the node count for MCP when removing labels from node which MCP uses to match with nodes

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

1. Remove `node-role.kubernetes.io/worker=` label from any worker node.
~~~
# oc label node worker-0.sharedocp4upi411ovn.lab.upshift.rdu2.redhat.com node-role.kubernetes.io/worker-
~~~
2. Check MCP worker for correct node count.
~~~
# oc get mcp  worker
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
worker   rendered-worker-6916abae250ad092875791f8297c13e1   True      False      False      3              3                   3                     0                      5d7h
~~~
3. Check after 10-15 mins
~~~
# oc get mcp  worker NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE worker   rendered-worker-6916abae250ad092875791f8297c13e1   True      False      False      2              2                   2                     0                      5d7h
~~~

Actual results:

It took 10-15 mins for MCP to detect node removal.

Expected results:

It should detect node removal as soon as the appropriate label from the node gets missing.

Additional info:

relates to

MCO-452 [tech-preview] Proper state reporting when the MCO changes state

Closed

links to

openshift/machine-config-operator#4097: OCPBUGS-5452: If node is not in pool, error

RHEA-2024:0041 OpenShift Container Platform 4.16.z bug fix update

Assignee:: Charles Doern

Reporter:: Divyam Pateriya

QA Contact:: Rio Liu

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2023/01/06 11:37 AM

Updated:: 2024/06/27 11:33 AM

Resolved:: 2024/06/27 11:33 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide