-
Bug
-
Resolution: Obsolete
-
Normal
-
None
-
4.11, 4.8
-
Low
-
No
-
False
-
Description of problem:
When a node has two or more custom roles, such as when performing a cluster update using canary rollout strategy, it is not counted amongst either roles. This leads to a reduced number of nodes in machineconfigpool reports, and the incorrect assumption that no nodes match the ancillary pool, such as the canary pool.
Version-Release number of selected component (if applicable):
This behaviour was observed in both 4.8 and 4.11 but probably also present in other 4.x versions
How reproducible:
Easily reproducible.
Steps to Reproduce:
1. Create a 4.8 cluster with 3+ workers. 2. Customise the role of some workers to be something other than "worker", for example as done when having some nodes configured with larger PID limits 3. Create additional worker pools, as described by the official documentaiton in the canary rollout stragegy for upgrades: https://docs.openshift.com/container-platform/4.8/updating/update-using-custom-machine-config-pools.html 4. Label some nodes with the new role without removing their original role
Actual results:
The machine counts in the machine configuration pool for the original role are decremented, but the new machine configuration pool still shows 0 machines.
Expected results:
The machine counts should properly reflect the roles of the nodes present in the cluster
Additional info:
The machine-config-controller logs show the following message for nodes with multiple custom roles: ~~~ 2023-03-23T17:53:18.030255756Z W0323 17:53:18.030197 1 node_controller.go:798] can't get pool for node "worker1.example.com": node worker1.example.com belongs to 2 custom roles, cannot proceed with this Node 2023-03-23T17:58:05.259428149Z E0323 17:58:05.259321 1 node_controller.go:441] error finding pool for node: node worker1.example.com belongs to 2 custom roles, cannot proceed with this Node ~~~ Observed on 4.8 and 4.11 but probably present on other versions