-
Bug
-
Resolution: Done
-
Critical
-
None
-
None
-
None
-
Quality / Stability / Reliability
-
0.42
-
False
-
-
False
-
None
-
-
CNV QE DevOps Sprint 284
-
None
Summary
OpenShift deployment on bare-metal cluster bm15a-tlv2 fails during the "Deploy OCP" stage: agent-based install never reaches bootstrap-complete. Three agent hosts (cnvqe-085, cnvqe-086, cnvqe-087) remain in "insufficient" state due to NTP sync and inter-host connectivity failures. The job eventually hits the stage timeout. Reporter notes this happens only with this particular BM cluster.
Actual result
- "Deploy OCP" runs openshift-install agent wait-for bootstrap-complete (90m timeout) and retries twice; both attempts fail with exit code 5.
- Then it runs wait-for install-complete (90m), which also times out.
- Final pipeline error: "Timeout has been exceeded while trying to Deploy OCP" → job FAILURE.
- All later stages (Add CatalogSources, Deploy OCS, Deploy CNV, etc.) are skipped.
Root cause (from assisted-service / agent validations)
Agent hosts cnvqe-085, cnvqe-086, cnvqe-087 are stuck in status "insufficient" with failing validations:
- "Host could not synchronize with any NTP server"
- "No connectivity to the majority of hosts in the cluster"
Because these hosts never become "known" / ready, the cluster never has 3 dedicated control plane nodes ready and bootstrap never completes. After wait failure, the installer also reports:
- dial tcp [2620:52:0:2ef8::215]:6443: connect: no route to host when gathering ClusterOperator status (API VIP unreachable, as bootstrap did not complete).
Evidence (Jenkins console)
- Job: deploy-ocp-bare-metal-cluster-with-abi-cnv-4.22 #26
- Cluster: bm15a-tlv2 (CLUSTER_DOMAIN=abi.cnv-qe.rhood.us)
- API VIP (from logs): 10.46.255.215 / 2620:52:0:2ef8::215
- RendezvousIP: 10.46.248.147
Relevant log excerpts:
level=debug msg=Host cnvqe-085.lab.eng.tlv2.redhat.com: updated status from discovering to insufficient (Host cannot be installed due to following failing validation(s): Host could not synchronize with any NTP server ; No connectivity to the majority of hosts in the cluster) level=debug msg=Host cnvqe-086.lab.eng.tlv2.redhat.com: ... (same) level=debug msg=Host cnvqe-087.lab.eng.tlv2.redhat.com: ... (same) level=error msg=Attempted to gather ClusterOperator status after wait failure: ... dial tcp [2620:52:0:2ef8::215]:6443: connect: no route to host level=error msg=Bootstrap failed to complete: : bootstrap process timed out: context deadline exceeded [WARNING] Attempt 1/2 failed with code 5. Retrying... ... (same for attempt 2) ... Timeout has been exceeded while trying to Deploy OCP ERROR: Timeout has been exceeded while trying to Deploy OCP Finished: FAILURE
Ansible play also reports "Assertion failed" (fatal) for cnvqe-085, cnvqe-086, cnvqe-087 in a later step, consistent with these hosts not meeting pre/install requirements.
Steps to reproduce
- Run job deploy-ocp-bare-metal-cluster-with-abi-cnv-4.22 with CLUSTER_NAME=bm15a-tlv2 (and parameters that trigger OCP deploy on this BM cluster).
- Wait for "Deploy OCP" stage: openshift-install agent wait-for bootstrap-complete runs from cnvqe-030.
- Observe: agent hosts 085, 086, 087 stay insufficient (NTP + connectivity).
- After 90m, attempt 1 fails with code 5; retry (attempt 2) same result; then wait-for install-complete runs and times out; finally pipeline timeout aborts the stage.
Expected result
Agent hosts sync NTP and have connectivity to the majority of hosts so they become "known" and cluster can proceed to bootstrap-complete and install-complete within the job timeout.
Environment / notes
- Reproduced only on this particular BM cluster (bm15a-tlv2); other BM clusters in the same job do not show this behavior.
- Suggests environment-specific issue on bm15a-tlv2: e.g. NTP not reachable from cnvqe-085/086/087, or network segmentation/firewall blocking required traffic between these hosts and/or to the rest of the cluster.
- Jenkins: jenkins-csb-cnvqe-main.dno.corp.redhat.com
- Job: deploy-ocp-bare-metal-cluster-with-abi-cnv-4.22, Build: 26
- OCP version target: 4.22, FIPS enabled, dual stack (IP_STACK=dual)
- relates to
-
OCPBUGS-74504 Installation cannot continue after host discovery for impossibility to change NTP configuration
-
- New
-