Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-49818

MachineConfigPool stays "updating" and several operators become degraded after changing instanceType of a master in an AWS cluster

XMLWordPrintable

    • Critical
    • None
    • False
    • Hide

      None

      Show
      None

      After having changed instanceType of one of the master nodes in an AWS cluster with OpenShift 4.14.23 from r7i.xlarge to m7i.xlarge, the MachineConfig involved has not worked correctly.

      Situation of the master nodes (real names not shown):

      master1    Ready,SchedulingDisabled   control-plane,master   257d   v1.27.13+401bb48
      master2   Ready                      control-plane,master   1h     v1.27.13+401bb48
      master3    Ready                      control-plane,master   257d   v1.27.13+401bb48
      master4    Ready                      control-plane,master   257d   v1.27.13+401bb48
      

      master1 should be removed, but it is still there although it does not seem part of the Etcd cluster.

      The issue showing up in an event of the MachineConfig operator project is the following:

      26m         Warning   OperatorDegraded: RequiredPoolsFailed   /machine-config                                                         Failed to resync 4.14.23 because: error during syncRequiredMachineConfigPools: [context deadline exceeded, failed to update clusteroperator: [client rate limiter Wait returned an error: context deadline exceeded, error required MachineConfigPool master is not ready, retrying. Status: (total: 4, ready 3, updated: 4, unavailable: 1, degraded: 0)]]
      

      Some operators show up as degraded (see must-gather). Regarding the MachineConfigPool "master":

      NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
      master   rendered-master-d7b33f294e86f5bd6e24bc5ca800a911   False     True       False      4              3                   4                     0                      257d
      
       - lastTransitionTime: "2025-02-04T08:34:46Z"
          message: All nodes are updating to MachineConfig rendered-master-d7b33f294e86f5bd6e24bc5ca800a911
          reason: ""
          status: "True"
          type: Updating
      

      OCPBUGS-20336 may be the same issue, although it happened during an upgrade. Its fix was not backported to OpenShift 4.14.

              rh-ee-tbarberb Theo Barber-Bany
              rhn-support-llopezmo Lucas López Montero
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: