Loading...

XML

Word

Printable

Type: Bug
Resolution: Won't Do
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.13.0
Component/s: Networking / SR-IOV
Labels:

Severity:
Important
Regression:
No
Story Points:
2
Sprint:
NHE Sprint 237
sprint_count:
1
Release Blocker:
Rejected
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Internal Whiteboard:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

Failed to resync 4.13.0-rc.2 because: Required MachineConfigPool 'master' is paused and can not sync until it is unpaused

Version-Release number of selected component (if applicable):

How reproducible:

This was seen one time after cluster been up 15 days

Steps to Reproduce:

1. Installed 4.13 rc2 MCO (m/w/s) compact cluster for Nokia
2. Apply operators/machine configs
3. This cluster been idle and installed 15 days ago. This issue seen today

Actual results:

authentication                             4.13.0-rc.2   True        False         True       14d     APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-oauth-apiserver ()...
baremetal                                  4.13.0-rc.2   True        False         False      15d
cloud-controller-manager                   4.13.0-rc.2   True        False         False      15d
cloud-credential                           4.13.0-rc.2   True        False         False      15d
cluster-autoscaler                         4.13.0-rc.2   True        False         False      15d
config-operator                            4.13.0-rc.2   True        False         False      15d
console                                    4.13.0-rc.2   True        False         False      15d
control-plane-machine-set                  4.13.0-rc.2   True        False         False      15d
csi-snapshot-controller                    4.13.0-rc.2   True        False         False      15d
dns                                        4.13.0-rc.2   True        False         False      15d
etcd                                       4.13.0-rc.2   True        False         False      15d
image-registry                             4.13.0-rc.2   True        False         False      3d21h
ingress                                    4.13.0-rc.2   True        False         False      15d
insights                                   4.13.0-rc.2   True        False         False      15d
kube-apiserver                             4.13.0-rc.2   True        False         False      15d
kube-controller-manager                    4.13.0-rc.2   True        False         False      15d
kube-scheduler                             4.13.0-rc.2   True        False         False      15d
kube-storage-version-migrator              4.13.0-rc.2   True        False         False      14d
machine-api                                4.13.0-rc.2   True        False         False      15d
machine-approver                           4.13.0-rc.2   True        False         False      15d
machine-config                             4.13.0-rc.2   True        False         True       15d     Failed to resync 4.13.0-rc.2 because: Required MachineConfigPool 'master' is paused and can not sync until it is unpaused
marketplace                                4.13.0-rc.2   True        False         False      15d
monitoring                                 4.13.0-rc.2   True        False         False      15d
network                                    4.13.0-rc.2   True        False         False      15d
node-tuning                                4.13.0-rc.2   True        False         False      3d21h
openshift-apiserver                        4.13.0-rc.2   True        False         True       15d     APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-apiserver ()oc get nodes
NAME                                         STATUS                     ROLES                         AGE   VERSION
master-0.kni-qe-31.lab.eng.rdu2.redhat.com   Ready,SchedulingDisabled   control-plane,master,worker   15d   v1.26.2+dc93b13
master-1.kni-qe-31.lab.eng.rdu2.redhat.com   Ready                      control-plane,master,worker   15d   v1.26.2+dc93b13
master-2.kni-qe-31.lab.eng.rdu2.redhat.com   Ready                      control-plane,master,worker   15d   v1.26.2+dc93b13

Expected results:

Expect pause only happen manually, suspect possibly sriov

Additional info:

I did not pause this node but I did "oc edit mcp master" and unpause which now results in this error:

machine-config                             4.13.0-rc.2   True        False         True       15d     Failed to resync 4.13.0-rc.2 because: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error required pool master is not ready, retrying. Status: (total: 3, ready 2, updated: 3, unavailable: 1, degraded: 0)]

## must-gather
http://10.1.101.1/4.13/must-gather/ocp413_rc2_mno_machine-config-paused-master-0.tar  After a couple hours gone by and unpause appears to have got corrected

## after about two hours unpausing master-0 the system seemed to return to normal but then error again machine-config                             4.13.0-rc.2   True        False         True       15d     Failed to resync 4.13.0-rc.2 because: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error required pool master is not ready, retrying. Status: (total: 3, ready 2, updated: 3, unavailable: 1, degraded: 0)]

 [kni@registry.kni-qe-31 4.13.0-rc.2]$ oc get nodes
NAME                                         STATUS                     ROLES                         AGE   VERSION
master-0.kni-qe-31.lab.eng.rdu2.redhat.com   Ready,SchedulingDisabled   control-plane,master,worker   15d   v1.26.2+dc93b13
master-1.kni-qe-31.lab.eng.rdu2.redhat.com   Ready                      control-plane,master,worker   15d   v1.26.2+dc93b13
master-2.kni-qe-31.lab.eng.rdu2.redhat.com   Ready                      control-plane,master,worker   15d   v1.26.2+dc93b13


## must-gather
http://10.1.101.1/4.13/must-gather/ocp413_rc2_mno_machine-config-erro_SsyncRequiredMachineConfigPools.tar

Reprinting Cluster State:
When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information:
ClusterID: c9722ce2-038c-4a71-98a5-3cc57bacb162
ClusterVersion: Stable at "4.13.0-rc.2"
ClusterOperators:
        clusteroperator/machine-config is degraded because Failed to resync 4.13.0-rc.2 because: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, error required pool master is not ready, retrying. Status: (total: 3, ready 2, updated: 3, unavailable: 1, degraded: 0)]
error: gather never finished for pod must-gather-wbdq7: pods "must-gather-wbdq7" not found

Assignee:: Balazs Nemeth

Reporter:: Mike Lammon (Inactive)

QA Contact:: Sergio Regidor de la Rosa

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2023/04/21 3:36 PM

Updated:: 2023/06/14 8:11 AM

Resolved:: 2023/06/14 8:11 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates