Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-56857

"ostree-prepare-root: Couldn't find specified OSTree root '/sysroot//ostree/boot.0/rhcos/xxx.../0': No such file or directory" during ZTP

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • None
    • None
    • None
    • Ready to Pick, Metal Platform 277
    • 2
    • contract-priority
    • Customer Escalated
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      Deploying 2 BM 4.16.24 clusters using ZTP (advanced-cluster-management.v2.11.3, openshift-gitops-operator.v1.14.1).
      Successfully deployed a lab (19 nodes) but doing the same on 2 other sites (20 nodes each) with the exact same HW this fails with some random nodes failing to deploy with error: "ostree-prepare-root: Couldn't find specified OSTree root '/sysroot//ostree/boot.0/rhcos/xxx.../0': No such file or directory"
      It doesn't matter the role of this node failing, sometimes is a master, making the whole deployment halt, or could be storage, gateway or worker.
      They have been able to deploy individually by roles, all 3 masters, then 2 gateways, after 4 storage nodes, and when trying to deploy 11 workers one of these failed. 
      These nodes go into emergency mode. We there checked /sysroot and it was empty.
      Our suspicions where something in their HW settings changing the order of disks. Because if they reboot the node, this is unable to find the boot disk unless they reboot several times. Then ignition starts again.
      But last logs provided show that everything is working as expected.
      We have in the related case must-gathers, sosreports, site-config files, deployment logs and the log from RHCOS deployment failing. If anything else is needed, please, let us know in #npss.

      Version-Release number of selected component (if applicable):

      OCP 4.16.24
      Red Hat Enterprise Linux CoreOS 416.94.202411201433-0 416.94.202411201433-0
      advanced-cluster-management.v2.11.3
      openshift-gitops-operator.v1.14.1  

      How reproducible:

      Not possible at this time with the resources at our disposal

      Steps to Reproduce:

      Not possible at this time with the resources at our disposal     

      Actual results:

      1 or 2 nodes failing to deploy

      Expected results:

      Deploy a whole cluster of 20 nodes using ZTP

      Additional info:

          

              rhn-engineering-dtantsur Dmitry Tantsur
              rhn-support-jveiraca1 Joaquin Veira
              Joaquin Veira
              None
              Michael Nguyen Michael Nguyen
              None
              Votes:
              0 Vote for this issue
              Watchers:
              24 Start watching this issue

                Created:
                Updated:
                Resolved: