Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-29459

Cluster Upgrade from 4.13.29 to 4.14.10 stuck due to degraded Machine Config Operator

XMLWordPrintable

      Description of problem:

          While upgrading a cluster from 4.13.29 to 4.14.10, the cluster upgrade gets stuck at machine config operator. The machines config operator is in degraded state due to the failure in completing ControllerConfig i.e. waitForControllerConfigToBeCompleted fails. Based on the logs from machine config controller pod is constantly throwing warnings suggesting malformed cert.

      Version-Release number of selected component (if applicable):

          4.13.29

      How reproducible:

          Install a vSphere IPI 4.13.29 cluster and upgrade the cluster to 4.14.10

      Steps to Reproduce:

          1. Install a 4.13.29 cluster on vSphere using IPI
          2. Upgrade the cluster to 4.14.10
          3. Upgrade gets stuck at machine config operator
          

      Actual results:

          # Degraded Operator 
      
      machine-config                            4.13.29  False      True         True
      
          # Logs from Machine-config-operator pod -
      
      2024-02-13T16:48:05.886202369Z I0213 16:48:05.886168       1 event.go:298] Event(v1.ObjectReference{Kind:"", Namespace:"openshift-machine-config-operator", Name:"machine-config", UID:"59e99d5a-4e8b-451d-9775-415c2665166f", APIVersion:"", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'MachineConfigControllerFailed' Cluster not available for [{operator 4.13.29}]: error during waitForControllerConfigToBeCompleted: [context deadline exceeded, controllerconfig is not completed: ControllerConfig has not completed: completed(false) running(true) failing(false)]
      
         # Logs from machine-config-controller pod - 
      
      2024-02-13T16:54:29.235597933Z I0213 16:54:29.235592       1 template_controller.go:500] Malformed Cert, not syncing
      2024-02-13T16:54:29.235668239Z I0213 16:54:29.235621       1 template_controller.go:500] Malformed Cert, not syncing
      
      
      

      Expected results:

          Cluster should upgrade successfully to 4.14.10

      Additional info:

          - In order to mitigate the issue tried to delete `controllerconfigs.machineconfiguration.openshift.io` as per the KCS (https://access.redhat.com/solutions/5098731) but issue still persists.
      
          - The cluster was installed on version 4.9.11 using vSphere IPI approach with OVNKubernetes CNI.

       

              rhn-engineering-skumari Sinny Kumari
              rhn-support-adikulka Aditya Kulkarni
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              Yu Qi Zhang
              Votes:
              3 Vote for this issue
              Watchers:
              13 Start watching this issue

                Created:
                Updated:
                Resolved: