-
Bug
-
Resolution: Not a Bug
-
Critical
-
None
-
4.14.z
-
None
-
Critical
-
None
-
False
-
After having changed instanceType of one of the master nodes in an AWS cluster with OpenShift 4.14.23 from r7i.xlarge to m7i.xlarge, the MachineConfig involved has not worked correctly.
Situation of the master nodes (real names not shown):
master1 Ready,SchedulingDisabled control-plane,master 257d v1.27.13+401bb48 master2 Ready control-plane,master 1h v1.27.13+401bb48 master3 Ready control-plane,master 257d v1.27.13+401bb48 master4 Ready control-plane,master 257d v1.27.13+401bb48
master1 should be removed, but it is still there although it does not seem part of the Etcd cluster.
The issue showing up in an event of the MachineConfig operator project is the following:
26m Warning OperatorDegraded: RequiredPoolsFailed /machine-config Failed to resync 4.14.23 because: error during syncRequiredMachineConfigPools: [context deadline exceeded, failed to update clusteroperator: [client rate limiter Wait returned an error: context deadline exceeded, error required MachineConfigPool master is not ready, retrying. Status: (total: 4, ready 3, updated: 4, unavailable: 1, degraded: 0)]]
Some operators show up as degraded (see must-gather). Regarding the MachineConfigPool "master":
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-d7b33f294e86f5bd6e24bc5ca800a911 False True False 4 3 4 0 257d
- lastTransitionTime: "2025-02-04T08:34:46Z" message: All nodes are updating to MachineConfig rendered-master-d7b33f294e86f5bd6e24bc5ca800a911 reason: "" status: "True" type: Updating
OCPBUGS-20336 may be the same issue, although it happened during an upgrade. Its fix was not backported to OpenShift 4.14.