Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-12741

Many SNOs failed to complete install with machine-config operator degraded (4.13.0-rc.5)

    XMLWordPrintable

Details

    • No
    • MCO Sprint 242
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      While attempting to install 2430 4.13.0-rc.5 SNOs 17 failed because the machine-config operator was in degraded state.  It appears the operator failed to find a specific rendered machine-config. There also appears to be another 6 clusters that failed with this and the etcd operator was also unavailable (https://issues.redhat.com/browse/OCPBUGS-12475) though the two issues appear to be entirely separate.

      Version-Release number of selected component (if applicable):

      Hub OCP - 4.12.10
      SNO OCP - 4.13.0-rc.5
      ACM - 2.8.0-DOWNSTREAM-2023-04-17-13-54-41

      How reproducible:

      1-.5% of installs produced this issue and it is the 2nd largest issue for SNO installs with 4.13

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

       

      Expected results:

       

      Additional info:

      Clusterversion and clusteroperator output from 3 affected machines:

      vm00064 
      NAME                                         VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      clusterversion.config.openshift.io/version             False       False         46h     Error while reconciling 4.13.0-rc.5: the cluster operator machine-config is degraded
      
      NAME                                                                           VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      clusteroperator.config.openshift.io/authentication                             4.13.0-rc.5   True        False         False      27h     
      clusteroperator.config.openshift.io/baremetal                                  4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/cloud-controller-manager                   4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/cloud-credential                           4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/cluster-autoscaler                         4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/config-operator                            4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/console                                    4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/control-plane-machine-set                  4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/csi-snapshot-controller                    4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/dns                                        4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/etcd                                       4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/image-registry                             4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/ingress                                    4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/insights                                   4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/kube-apiserver                             4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/kube-controller-manager                    4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/kube-scheduler                             4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/kube-storage-version-migrator              4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/machine-api                                4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/machine-approver                           4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/machine-config                                           True        True          True       46h     Unable to apply 4.13.0-rc.5: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error pool master is not ready, retrying. Status: (pool degraded: true total: 1, ready 0, updated: 0, unavailable: 1)]
      clusteroperator.config.openshift.io/marketplace                                4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/monitoring                                 4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/network                                    4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/node-tuning                                4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/openshift-apiserver                        4.13.0-rc.5   True        False         False      22h     
      clusteroperator.config.openshift.io/openshift-controller-manager               4.13.0-rc.5   True        False         False      22h     
      clusteroperator.config.openshift.io/openshift-samples                          4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/operator-lifecycle-manager                 4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/operator-lifecycle-manager-catalog         4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/operator-lifecycle-manager-packageserver   4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/service-ca                                 4.13.0-rc.5   True        False         False      46h     
      clusteroperator.config.openshift.io/storage                                    4.13.0-rc.5   True        False         False      46h     
      vm00208 
      NAME                                         VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      clusterversion.config.openshift.io/version             False       False         47h     Error while reconciling 4.13.0-rc.5: the cluster operator machine-config is degraded
      
      NAME                                                                           VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      clusteroperator.config.openshift.io/authentication                             4.13.0-rc.5   True        False         False      28h     
      clusteroperator.config.openshift.io/baremetal                                  4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/cloud-controller-manager                   4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/cloud-credential                           4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/cluster-autoscaler                         4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/config-operator                            4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/console                                    4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/control-plane-machine-set                  4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/csi-snapshot-controller                    4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/dns                                        4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/etcd                                       4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/image-registry                             4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/ingress                                    4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/insights                                   4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/kube-apiserver                             4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/kube-controller-manager                    4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/kube-scheduler                             4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/kube-storage-version-migrator              4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/machine-api                                4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/machine-approver                           4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/machine-config                                           True        True          True       47h     Unable to apply 4.13.0-rc.5: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error pool master is not ready, retrying. Status: (pool degraded: true total: 1, ready 0, updated: 0, unavailable: 1)]
      clusteroperator.config.openshift.io/marketplace                                4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/monitoring                                 4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/network                                    4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/node-tuning                                4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/openshift-apiserver                        4.13.0-rc.5   True        False         False      23h     
      clusteroperator.config.openshift.io/openshift-controller-manager               4.13.0-rc.5   True        False         False      28h     
      clusteroperator.config.openshift.io/openshift-samples                          4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/operator-lifecycle-manager                 4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/operator-lifecycle-manager-catalog         4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/operator-lifecycle-manager-packageserver   4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/service-ca                                 4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/storage                                    4.13.0-rc.5   True        False         False      47h     
      vm00244 
      NAME                                         VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      clusterversion.config.openshift.io/version             False       False         47h     Error while reconciling 4.13.0-rc.5: the cluster operator machine-config is degraded
      
      NAME                                                                           VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      clusteroperator.config.openshift.io/authentication                             4.13.0-rc.5   True        False         False      23h     
      clusteroperator.config.openshift.io/baremetal                                  4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/cloud-controller-manager                   4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/cloud-credential                           4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/cluster-autoscaler                         4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/config-operator                            4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/console                                    4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/control-plane-machine-set                  4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/csi-snapshot-controller                    4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/dns                                        4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/etcd                                       4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/image-registry                             4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/ingress                                    4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/insights                                   4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/kube-apiserver                             4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/kube-controller-manager                    4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/kube-scheduler                             4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/kube-storage-version-migrator              4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/machine-api                                4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/machine-approver                           4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/machine-config                                           True        True          True       47h     Unable to apply 4.13.0-rc.5: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error pool master is not ready, retrying. Status: (pool degraded: true total: 1, ready 0, updated: 0, unavailable: 1)]
      clusteroperator.config.openshift.io/marketplace                                4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/monitoring                                 4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/network                                    4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/node-tuning                                4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/openshift-apiserver                        4.13.0-rc.5   True        False         False      23h     
      clusteroperator.config.openshift.io/openshift-controller-manager               4.13.0-rc.5   True        False         False      28h     
      clusteroperator.config.openshift.io/openshift-samples                          4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/operator-lifecycle-manager                 4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/operator-lifecycle-manager-catalog         4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/operator-lifecycle-manager-packageserver   4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/service-ca                                 4.13.0-rc.5   True        False         False      47h     
      clusteroperator.config.openshift.io/storage                                    4.13.0-rc.5   True        False         False      47h     

      Log snippets from 3 of the machines too:

      vm00064 
      I0425 18:55:16.235557       1 container_runtime_config_controller.go:888] Applied ImageConfig cluster on MachineConfigPool master
      I0425 18:55:16.323538       1 container_runtime_config_controller.go:888] Applied ImageConfig cluster on MachineConfigPool worker
      I0425 18:55:16.414195       1 container_runtime_config_controller.go:888] Applied ImageConfig cluster on MachineConfigPool master
      I0425 18:55:16.538787       1 container_runtime_config_controller.go:888] Applied ImageConfig cluster on MachineConfigPool worker
      I0425 18:57:05.426173       1 render_controller.go:569] BaseOSContainerImage=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c0f2488c7d26dfe44966e870a8306e500639be45ff22a5b799192f01e2a84479
      I0425 18:57:05.426232       1 render_controller.go:569] BaseOSContainerImage=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c0f2488c7d26dfe44966e870a8306e500639be45ff22a5b799192f01e2a84479
      I0425 18:57:05.426291       1 status.go:108] Degraded Machine: vm00064 and Degraded Reason: machineconfig.machineconfiguration.openshift.io "rendered-master-02da82f63661253957f9008d0fb5d5ac" not found
      I0425 19:03:51.423738       1 template_controller.go:137] Re-syncing ControllerConfig due to secret pull-secret change
      I0425 19:11:37.513258       1 container_runtime_config_controller.go:888] Applied ImageConfig cluster on MachineConfigPool worker
      I0425 19:11:37.611012       1 container_runtime_config_controller.go:888] Applied ImageConfig cluster on MachineConfigPool master
      vm00208 
      I0425 18:50:22.727969       1 container_runtime_config_controller.go:888] Applied ImageConfig cluster on MachineConfigPool worker
      I0425 18:53:38.267324       1 template_controller.go:137] Re-syncing ControllerConfig due to secret pull-secret change
      I0425 19:00:16.668388       1 status.go:108] Degraded Machine: vm00208 and Degraded Reason: machineconfig.machineconfiguration.openshift.io "rendered-master-de56dd6f50db030b1a30ad6d7eb85128" not found
      I0425 19:00:16.668415       1 render_controller.go:569] BaseOSContainerImage=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c0f2488c7d26dfe44966e870a8306e500639be45ff22a5b799192f01e2a84479
      I0425 19:00:16.668389       1 render_controller.go:569] BaseOSContainerImage=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c0f2488c7d26dfe44966e870a8306e500639be45ff22a5b799192f01e2a84479
      I0425 19:02:31.643352       1 container_runtime_config_controller.go:888] Applied ImageConfig cluster on MachineConfigPool master
      I0425 19:02:31.751885       1 container_runtime_config_controller.go:888] Applied ImageConfig cluster on MachineConfigPool worker
      I0425 19:13:57.352102       1 template_controller.go:137] Re-syncing ControllerConfig due to secret pull-secret change
      I0425 19:17:16.139579       1 container_runtime_config_controller.go:888] Applied ImageConfig cluster on MachineConfigPool master
      I0425 19:17:16.237173       1 container_runtime_config_controller.go:888] Applied ImageConfig cluster on MachineConfigPool worker
      vm00244 
      I0425 18:53:54.586699       1 container_runtime_config_controller.go:888] Applied ImageConfig cluster on MachineConfigPool worker
      I0425 18:53:54.689809       1 container_runtime_config_controller.go:888] Applied ImageConfig cluster on MachineConfigPool master
      I0425 18:58:25.221205       1 container_runtime_config_controller.go:888] Applied ImageConfig cluster on MachineConfigPool worker
      I0425 18:58:25.409914       1 container_runtime_config_controller.go:888] Applied ImageConfig cluster on MachineConfigPool master
      I0425 18:59:47.286889       1 render_controller.go:569] BaseOSContainerImage=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c0f2488c7d26dfe44966e870a8306e500639be45ff22a5b799192f01e2a84479
      I0425 18:59:47.286897       1 render_controller.go:569] BaseOSContainerImage=quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c0f2488c7d26dfe44966e870a8306e500639be45ff22a5b799192f01e2a84479
      I0425 18:59:47.538265       1 status.go:108] Degraded Machine: vm00244 and Degraded Reason: machineconfig.machineconfiguration.openshift.io "rendered-master-16aaaf522c743020f7ca48697606c0a8" not found
      I0425 19:11:09.740626       1 template_controller.go:137] Re-syncing ControllerConfig due to secret pull-secret change
      I0425 19:16:09.597589       1 container_runtime_config_controller.go:888] Applied ImageConfig cluster on MachineConfigPool worker
      I0425 19:16:09.775052       1 container_runtime_config_controller.go:888] Applied ImageConfig cluster on MachineConfigPool master

      Attachments

        Activity

          People

            jerzhang@redhat.com Yu Qi Zhang
            akrzos@redhat.com Alex Krzos
            Sergio Regidor de la Rosa Sergio Regidor de la Rosa
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: