-
Bug
-
Resolution: Cannot Reproduce
-
Undefined
-
None
-
4.16.z
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
None
-
None
-
None
-
None
-
Ready to Pick, Metal Platform 277
-
2
-
contract-priority
-
Customer Escalated
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Deploying 2 BM 4.16.24 clusters using ZTP (advanced-cluster-management.v2.11.3, openshift-gitops-operator.v1.14.1). Successfully deployed a lab (19 nodes) but doing the same on 2 other sites (20 nodes each) with the exact same HW this fails with some random nodes failing to deploy with error: "ostree-prepare-root: Couldn't find specified OSTree root '/sysroot//ostree/boot.0/rhcos/xxx.../0': No such file or directory" It doesn't matter the role of this node failing, sometimes is a master, making the whole deployment halt, or could be storage, gateway or worker. They have been able to deploy individually by roles, all 3 masters, then 2 gateways, after 4 storage nodes, and when trying to deploy 11 workers one of these failed. These nodes go into emergency mode. We there checked /sysroot and it was empty. Our suspicions where something in their HW settings changing the order of disks. Because if they reboot the node, this is unable to find the boot disk unless they reboot several times. Then ignition starts again. But last logs provided show that everything is working as expected. We have in the related case must-gathers, sosreports, site-config files, deployment logs and the log from RHCOS deployment failing. If anything else is needed, please, let us know in #npss.
Version-Release number of selected component (if applicable):
OCP 4.16.24 Red Hat Enterprise Linux CoreOS 416.94.202411201433-0 416.94.202411201433-0 advanced-cluster-management.v2.11.3 openshift-gitops-operator.v1.14.1
How reproducible:
Not possible at this time with the resources at our disposal
Steps to Reproduce:
Not possible at this time with the resources at our disposal
Actual results:
1 or 2 nodes failing to deploy
Expected results:
Deploy a whole cluster of 20 nodes using ZTP
Additional info: