-
Bug
-
Resolution: Cannot Reproduce
-
Critical
-
None
-
4.14.0
-
None
-
Critical
-
Yes
-
Rejected
-
False
-
Description of problem:
IPI installation failed with one master node NotReady
Version-Release number of selected component (if applicable):
4.13.0-0.nightly-2023-03-22-165711
How reproducible:
Not every time, but at least 60%.
Steps to Reproduce:
1. "create install-config", then insert "credentialsMode: Manual" into install-config.yaml 2. "create manifests" 3. manually create the required credentials 4. "create cluster"
Actual results:
1. Installation failed with one master node stuck in NotReady. Besides, the kube-apiserver seems only available on the NotReady master node. 2. "oc adm must-gather" cannot finish due to below error: [must-gather-kz68s] POD 2023-03-23T08:05:28.094281416Z E0323 08:05:28.094237 559 memcache.go:106] couldn't get resource list for metrics.k8s.io/v1beta1: the server is currently unable to handle the request
Expected results:
Installation should succeed.
Additional info:
$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False True 3h44m Unable to apply 4.13.0-0.nightly-2023-03-22-165711: some cluster operators are not available $ oc get nodes NAME STATUS ROLES AGE VERSION jiwei-0323b-sbfhb-master-0 Ready control-plane,master 3h38m v1.26.2+dc93b13 jiwei-0323b-sbfhb-master-1 NotReady control-plane,master 3h39m v1.26.2+dc93b13 jiwei-0323b-sbfhb-master-2 Ready control-plane,master 3h38m v1.26.2+dc93b13 jiwei-0323b-sbfhb-worker-us-east-1a-rm6gz Ready worker 3h14m v1.26.2+dc93b13 jiwei-0323b-sbfhb-worker-us-east-1b-d2vlp Ready worker 3h16m v1.26.2+dc93b13 $ oc get co | grep -v 'True False False' NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.13.0-0.nightly-2023-03-22-165711 False True True 3h34m WellKnownAv ailable: The well-known endpoint is not yet available: kube-apiserver oauth endpoint https://10.0.161.254:6443/.well-known/oauth-authori zation-server is not yet served and authentication operator keeps waiting (check kube-apiserver operator, and check that instances roll out successfully, which can take several minutes per instance) console 4.13.0-0.nightly-2023-03-22-165711 False True False 3h17m DeploymentAvailable: 0 replicas available for console deployment... dns 4.13.0-0.nightly-2023-03-22-165711 True True False 3h31m DNS "default" reports Progressing=True: "Have 4 available node-resolver pods, want 5." etcd 4.13.0-0.nightly-2023-03-22-165711 True True True 3h26m NodeControllerDegraded: The master nodes not ready: node "jiwei-0323b-sbfhb-master-1" not ready since 2023-03-23 04:39:07 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.) image-registry 4.13.0-0.nightly-2023-03-22-165711 True True False 3h13m Progressing: The registry is ready... kube-apiserver 4.13.0-0.nightly-2023-03-22-165711 False True True 3h31m StaticPodsAvailable: 0 nodes are active; 3 nodes are at revision 0; 0 nodes have achieved new revision 8 kube-controller-manager 4.13.0-0.nightly-2023-03-22-165711 True True True 3h27m NodeControllerDegraded: The master nodes not ready: node "jiwei-0323b-sbfhb-master-1" not ready since 2023-03-23 04:39:07 +0000 UTC because NodeStatusUnknown (Kubelet stopped posting node status.) kube-scheduler 4.13.0-0.nightly-2023-03-22-165711 True True True 3h27m InstallerPodContainerWaitingDegraded: Pod "installer-8-jiwei-0323b-sbfhb-master-1" on node "jiwei-0323b-sbfhb-master-1" container "installer" is waiting since 2023-03-23 04:34:55 +0000 UTC because ContainerCreating... machine-config 4.13.0-0.nightly-2023-03-22-165711 False False True 3h1m Cluster not available for [{operator 4.13.0-0.nightly-2023-03-22-165711}]: failed to apply machine config daemon manifests: error during waitForDaemonsetRollout: [timed out waiting for the condition, daemonset machine-config-daemon is not ready. status: (desired: 5, updated: 5, ready: 4, unavailable: 1)] network 4.13.0-0.nightly-2023-03-22-165711 True True False 3h27m DaemonSet "/openshift-multus/network-metrics-daemon" is not available (awaiting 1 nodes)... openshift-apiserver 4.13.0-0.nightly-2023-03-22-165711 True True True 3h21m APIServerDeploymentDegraded: 1 of 3 requested instances are unavailable for apiserver.openshift-apiserver () openshift-controller-manager True True False 3h22m Progressing: deployment/controller-manager: updated replicas is 1, desired replicas is 3... storage 4.13.0-0.nightly-2023-03-22-165711 True True False 3h30m AlibabaDiskCSIDriverOperatorCRProgressing: AlibabaCloudDriverNodeServiceControllerProgressing: Waiting for DaemonSet to deploy node pods $ oc get mc NAME GENERATEDBYCONTROLLER IGNITIONVERSION AGE 00-master 40575b862f7bd42a2c40c8e6b7203cd4c29b0021 3.2.0 3h35m 00-worker 40575b862f7bd42a2c40c8e6b7203cd4c29b0021 3.2.0 3h35m 01-master-container-runtime 40575b862f7bd42a2c40c8e6b7203cd4c29b0021 3.2.0 3h35m 01-master-kubelet 40575b862f7bd42a2c40c8e6b7203cd4c29b0021 3.2.0 3h35m 01-worker-container-runtime 40575b862f7bd42a2c40c8e6b7203cd4c29b0021 3.2.0 3h35m 01-worker-kubelet 40575b862f7bd42a2c40c8e6b7203cd4c29b0021 3.2.0 3h35m 99-master-generated-registries 40575b862f7bd42a2c40c8e6b7203cd4c29b0021 3.2.0 3h35m 99-master-ssh 3.2.0 3h46m 99-worker-generated-registries 40575b862f7bd42a2c40c8e6b7203cd4c29b0021 3.2.0 3h35m 99-worker-ssh 3.2.0 3h46m rendered-master-9e0818c061f7631f68edf9b2ba5e99a3 40575b862f7bd42a2c40c8e6b7203cd4c29b0021 3.2.0 3h23m rendered-master-cae5598b9b13fb23fcd137194dd792a2 40575b862f7bd42a2c40c8e6b7203cd4c29b0021 3.2.0 3h35m rendered-worker-00835171ebcd7e1659f374a933dec318 40575b862f7bd42a2c40c8e6b7203cd4c29b0021 3.2.0 3h35m rendered-worker-26e6e9e3ac817c53ec3e6fa304c93334 40575b862f7bd42a2c40c8e6b7203cd4c29b0021 3.2.0 3h23m $ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-cae5598b9b13fb23fcd137194dd792a2 False True False 3 1 1 0 3h36m worker rendered-worker-26e6e9e3ac817c53ec3e6fa304c93334 True False False 2 2 2 0 3h36m $
- clones
-
OCPBUGS-10768 [alibabacloud] IPI installation failed with one master node NotReady
- Closed