Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.14.z
Component/s: Cloud Compute / Machine API Providers
Labels:
None

Severity:
Critical
Regression:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

After having changed instanceType of one of the master nodes in an AWS cluster with OpenShift 4.14.23 from r7i.xlarge to m7i.xlarge, the MachineConfig involved has not worked correctly.

Situation of the master nodes (real names not shown):

master1    Ready,SchedulingDisabled   control-plane,master   257d   v1.27.13+401bb48
master2   Ready                      control-plane,master   1h     v1.27.13+401bb48
master3    Ready                      control-plane,master   257d   v1.27.13+401bb48
master4    Ready                      control-plane,master   257d   v1.27.13+401bb48

master1 should be removed, but it is still there although it does not seem part of the Etcd cluster.

The issue showing up in an event of the MachineConfig operator project is the following:

26m         Warning   OperatorDegraded: RequiredPoolsFailed   /machine-config                                                         Failed to resync 4.14.23 because: error during syncRequiredMachineConfigPools: [context deadline exceeded, failed to update clusteroperator: [client rate limiter Wait returned an error: context deadline exceeded, error required MachineConfigPool master is not ready, retrying. Status: (total: 4, ready 3, updated: 4, unavailable: 1, degraded: 0)]]

Some operators show up as degraded (see must-gather). Regarding the MachineConfigPool "master":

NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-d7b33f294e86f5bd6e24bc5ca800a911   False     True       False      4              3                   4                     0                      257d

 - lastTransitionTime: "2025-02-04T08:34:46Z"
    message: All nodes are updating to MachineConfig rendered-master-d7b33f294e86f5bd6e24bc5ca800a911
    reason: ""
    status: "True"
    type: Updating

~~OCPBUGS-20336~~ may be the same issue, although it happened during an upgrade. Its fix was not backported to OpenShift 4.14.

links to

OpenShift node not being able to be deleted by Machine API in a cluster running on AWS

Assignee:: Theo Barber-Bany

Reporter:: Lucas López Montero

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2025/02/04 12:39 PM

Updated:: 2025/02/06 12:14 PM

Resolved:: 2025/02/04 3:38 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates