Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-12262

4.13 RC2: MNO: Failed to resync 4.13.0-rc.2 because: Required MachineConfigPool 'master' is paused and can not sync until it is unpaused

XMLWordPrintable

    • Important
    • No
    • 2
    • NHE Sprint 237
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Failed to resync 4.13.0-rc.2 because: Required MachineConfigPool 'master' is paused and can not sync until it is unpaused

      Version-Release number of selected component (if applicable):

      How reproducible:

      This was seen one time after cluster been up 15 days

      Steps to Reproduce:

      1. Installed 4.13 rc2 MCO (m/w/s) compact cluster for Nokia
      2. Apply operators/machine configs
      3. This cluster been idle and installed 15 days ago. This issue seen today
      

      Actual results:

      authentication                             4.13.0-rc.2   True        False         True       14d     APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-oauth-apiserver ()...
      baremetal                                  4.13.0-rc.2   True        False         False      15d
      cloud-controller-manager                   4.13.0-rc.2   True        False         False      15d
      cloud-credential                           4.13.0-rc.2   True        False         False      15d
      cluster-autoscaler                         4.13.0-rc.2   True        False         False      15d
      config-operator                            4.13.0-rc.2   True        False         False      15d
      console                                    4.13.0-rc.2   True        False         False      15d
      control-plane-machine-set                  4.13.0-rc.2   True        False         False      15d
      csi-snapshot-controller                    4.13.0-rc.2   True        False         False      15d
      dns                                        4.13.0-rc.2   True        False         False      15d
      etcd                                       4.13.0-rc.2   True        False         False      15d
      image-registry                             4.13.0-rc.2   True        False         False      3d21h
      ingress                                    4.13.0-rc.2   True        False         False      15d
      insights                                   4.13.0-rc.2   True        False         False      15d
      kube-apiserver                             4.13.0-rc.2   True        False         False      15d
      kube-controller-manager                    4.13.0-rc.2   True        False         False      15d
      kube-scheduler                             4.13.0-rc.2   True        False         False      15d
      kube-storage-version-migrator              4.13.0-rc.2   True        False         False      14d
      machine-api                                4.13.0-rc.2   True        False         False      15d
      machine-approver                           4.13.0-rc.2   True        False         False      15d
      machine-config                             4.13.0-rc.2   True        False         True       15d     Failed to resync 4.13.0-rc.2 because: Required MachineConfigPool 'master' is paused and can not sync until it is unpaused
      marketplace                                4.13.0-rc.2   True        False         False      15d
      monitoring                                 4.13.0-rc.2   True        False         False      15d
      network                                    4.13.0-rc.2   True        False         False      15d
      node-tuning                                4.13.0-rc.2   True        False         False      3d21h
      openshift-apiserver                        4.13.0-rc.2   True        False         True       15d     APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-apiserver ()oc get nodes
      NAME                                         STATUS                     ROLES                         AGE   VERSION
      master-0.kni-qe-31.lab.eng.rdu2.redhat.com   Ready,SchedulingDisabled   control-plane,master,worker   15d   v1.26.2+dc93b13
      master-1.kni-qe-31.lab.eng.rdu2.redhat.com   Ready                      control-plane,master,worker   15d   v1.26.2+dc93b13
      master-2.kni-qe-31.lab.eng.rdu2.redhat.com   Ready                      control-plane,master,worker   15d   v1.26.2+dc93b13
      

      Expected results:

      Expect pause only happen manually, suspect possibly sriov

      Additional info:

      I did not pause this node but I did "oc edit mcp master" and unpause which now results in this error:
      
      machine-config                             4.13.0-rc.2   True        False         True       15d     Failed to resync 4.13.0-rc.2 because: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error required pool master is not ready, retrying. Status: (total: 3, ready 2, updated: 3, unavailable: 1, degraded: 0)]
      
      ## must-gather
      http://10.1.101.1/4.13/must-gather/ocp413_rc2_mno_machine-config-paused-master-0.tar  After a couple hours gone by and unpause appears to have got corrected
      
      ## after about two hours unpausing master-0 the system seemed to return to normal but then error again machine-config                             4.13.0-rc.2   True        False         True       15d     Failed to resync 4.13.0-rc.2 because: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error required pool master is not ready, retrying. Status: (total: 3, ready 2, updated: 3, unavailable: 1, degraded: 0)]
      
       [kni@registry.kni-qe-31 4.13.0-rc.2]$ oc get nodes
      NAME                                         STATUS                     ROLES                         AGE   VERSION
      master-0.kni-qe-31.lab.eng.rdu2.redhat.com   Ready,SchedulingDisabled   control-plane,master,worker   15d   v1.26.2+dc93b13
      master-1.kni-qe-31.lab.eng.rdu2.redhat.com   Ready                      control-plane,master,worker   15d   v1.26.2+dc93b13
      master-2.kni-qe-31.lab.eng.rdu2.redhat.com   Ready                      control-plane,master,worker   15d   v1.26.2+dc93b13
      
      
      ## must-gather
      http://10.1.101.1/4.13/must-gather/ocp413_rc2_mno_machine-config-erro_SsyncRequiredMachineConfigPools.tar
      
      Reprinting Cluster State:
      When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information:
      ClusterID: c9722ce2-038c-4a71-98a5-3cc57bacb162
      ClusterVersion: Stable at "4.13.0-rc.2"
      ClusterOperators:
              clusteroperator/machine-config is degraded because Failed to resync 4.13.0-rc.2 because: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error required pool master is not ready, retrying. Status: (total: 3, ready 2, updated: 3, unavailable: 1, degraded: 0)]
      error: gather never finished for pod must-gather-wbdq7: pods "must-gather-wbdq7" not found

            bnemeth@redhat.com Balazs Nemeth
            mlammon@redhat.com Mike Lammon
            Sergio Regidor de la Rosa Sergio Regidor de la Rosa
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: