-
Bug
-
Resolution: Won't Do
-
Normal
-
None
-
4.13.0
-
Important
-
No
-
2
-
NHE Sprint 237
-
1
-
Rejected
-
False
-
-
Description of problem:
Failed to resync 4.13.0-rc.2 because: Required MachineConfigPool 'master' is paused and can not sync until it is unpaused
Version-Release number of selected component (if applicable):
How reproducible:
This was seen one time after cluster been up 15 days
Steps to Reproduce:
1. Installed 4.13 rc2 MCO (m/w/s) compact cluster for Nokia 2. Apply operators/machine configs 3. This cluster been idle and installed 15 days ago. This issue seen today
Actual results:
authentication 4.13.0-rc.2 True False True 14d APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-oauth-apiserver ()... baremetal 4.13.0-rc.2 True False False 15d cloud-controller-manager 4.13.0-rc.2 True False False 15d cloud-credential 4.13.0-rc.2 True False False 15d cluster-autoscaler 4.13.0-rc.2 True False False 15d config-operator 4.13.0-rc.2 True False False 15d console 4.13.0-rc.2 True False False 15d control-plane-machine-set 4.13.0-rc.2 True False False 15d csi-snapshot-controller 4.13.0-rc.2 True False False 15d dns 4.13.0-rc.2 True False False 15d etcd 4.13.0-rc.2 True False False 15d image-registry 4.13.0-rc.2 True False False 3d21h ingress 4.13.0-rc.2 True False False 15d insights 4.13.0-rc.2 True False False 15d kube-apiserver 4.13.0-rc.2 True False False 15d kube-controller-manager 4.13.0-rc.2 True False False 15d kube-scheduler 4.13.0-rc.2 True False False 15d kube-storage-version-migrator 4.13.0-rc.2 True False False 14d machine-api 4.13.0-rc.2 True False False 15d machine-approver 4.13.0-rc.2 True False False 15d machine-config 4.13.0-rc.2 True False True 15d Failed to resync 4.13.0-rc.2 because: Required MachineConfigPool 'master' is paused and can not sync until it is unpaused marketplace 4.13.0-rc.2 True False False 15d monitoring 4.13.0-rc.2 True False False 15d network 4.13.0-rc.2 True False False 15d node-tuning 4.13.0-rc.2 True False False 3d21h openshift-apiserver 4.13.0-rc.2 True False True 15d APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-apiserver ()oc get nodes NAME STATUS ROLES AGE VERSION master-0.kni-qe-31.lab.eng.rdu2.redhat.com Ready,SchedulingDisabled control-plane,master,worker 15d v1.26.2+dc93b13 master-1.kni-qe-31.lab.eng.rdu2.redhat.com Ready control-plane,master,worker 15d v1.26.2+dc93b13 master-2.kni-qe-31.lab.eng.rdu2.redhat.com Ready control-plane,master,worker 15d v1.26.2+dc93b13
Expected results:
Expect pause only happen manually, suspect possibly sriov
Additional info:
I did not pause this node but I did "oc edit mcp master" and unpause which now results in this error: machine-config 4.13.0-rc.2 True False True 15d Failed to resync 4.13.0-rc.2 because: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error required pool master is not ready, retrying. Status: (total: 3, ready 2, updated: 3, unavailable: 1, degraded: 0)] ## must-gather http://10.1.101.1/4.13/must-gather/ocp413_rc2_mno_machine-config-paused-master-0.tar After a couple hours gone by and unpause appears to have got corrected ## after about two hours unpausing master-0 the system seemed to return to normal but then error again machine-config 4.13.0-rc.2 True False True 15d Failed to resync 4.13.0-rc.2 because: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error required pool master is not ready, retrying. Status: (total: 3, ready 2, updated: 3, unavailable: 1, degraded: 0)] [kni@registry.kni-qe-31 4.13.0-rc.2]$ oc get nodes NAME STATUS ROLES AGE VERSION master-0.kni-qe-31.lab.eng.rdu2.redhat.com Ready,SchedulingDisabled control-plane,master,worker 15d v1.26.2+dc93b13 master-1.kni-qe-31.lab.eng.rdu2.redhat.com Ready control-plane,master,worker 15d v1.26.2+dc93b13 master-2.kni-qe-31.lab.eng.rdu2.redhat.com Ready control-plane,master,worker 15d v1.26.2+dc93b13 ## must-gather http://10.1.101.1/4.13/must-gather/ocp413_rc2_mno_machine-config-erro_SsyncRequiredMachineConfigPools.tar Reprinting Cluster State: When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information: ClusterID: c9722ce2-038c-4a71-98a5-3cc57bacb162 ClusterVersion: Stable at "4.13.0-rc.2" ClusterOperators: clusteroperator/machine-config is degraded because Failed to resync 4.13.0-rc.2 because: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error required pool master is not ready, retrying. Status: (total: 3, ready 2, updated: 3, unavailable: 1, degraded: 0)] error: gather never finished for pod must-gather-wbdq7: pods "must-gather-wbdq7" not found