Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-10959

Baremetal servers fail to boot from disk

    XMLWordPrintable

Details

    • Important
    • No
    • 8
    • Metal Platform 234
    • 1
    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      We've been investigating persistent errors on our baremetal lab,
      After the image is written to disk by ironic python agent (using coreos-installer), hosts are failing to reboot to the hard drive
      during boot they attempt to boot from hard drive but quickly move onto the next entry in the boot order with no error being displayed
      
      the first assumption was that some servers in our BM environment were having hardware failure but running some tests on
      one of our baremetal environments we noticed that this seems to be a regression possibly with coreos-installer 

      Version-Release number of selected component (if applicable):

      the first assumption was that some servers in our BM environment were having hardware failure but running some tests on
      one of our baremetal environments we noticed that this seems to be a regression
      

      How reproducible:

      4.14 nighties are consistently showing the error on at least on baremetal node in the cluster (not usually all of them, 1 or 2 masters usually boots fine), multiple retries
      4.13 tried once, which reproduced the problem
      4.12 1 attempt, ran fine
      4.11 nightlies are fine, I provisioned 6 runs on the same environment without a single failure

      Steps to Reproduce:

      provisioning clusters with baremetal ipi on baremetal,
      

      Actual results:

      once masters go active some will fail to boot and get stuck in a POST reboot loop

      Expected results:

      All master nodes should boot

      Additional info:

      I'm using Dell PowerEdge R340, each with 3x
      PowerEdge R340 Each with 3 x SSDSC2KG240G8R BIOS Version    2.12.2  (also reproduced on older version) iDRAC Firmware Version    6.10.30.00 (also reproduced on older version)
      
      [root@host3 core]# /usr/bin/coreos-installer -V
      coreos-installer 0.16.1
      [root@host3 core]# cat /etc/redhat-release 
      CentOS Stream CoreOS release 4.13
      
      
      I'll attach the full IPA ramdisk log but coreos-installer seems to be completing without failure
      
      2023-03-28 09:23:17.313 1 INFO ironic_coreos_install [-] Executing CoreOS installer: ['chroot', '/mnt/coreos', 'coreos-installer', 'install', '--preserve-on-error', '--ignition-file', '/tmp/ironic.ign', '--offline', '--append-karg', 'ip=dhcp', '/dev/sda']
      2023-03-28 09:23:17.328 1 DEBUG ironic_coreos_install [-] coreos-installer: Installing CentOS Stream CoreOS 413.92.202303011445-0 (Plow) x86_64 (512-byte sectors) _run_install /usr/lib/python3.9/site-packages/ironic_coreos_install.py:193
      ...
      2023-03-28 09:23:45.121 1 DEBUG ironic_coreos_install [-] coreos-installer: Read disk 3.5 GiB/3.5 GiB (100%) _run_install /usr/lib/python3.9/site-packages/ironic_coreos_install.py:193
      2023-03-28 09:23:45.121 1 DEBUG ironic_coreos_install [-] coreos-installer: Read disk 3.5 GiB/3.5 GiB (100%) _run_install /usr/lib/python3.9/site-packages/ironic_coreos_install.py:193
      2023-03-28 09:23:46.389 1 DEBUG ironic_coreos_install [-] coreos-installer: Writing Ignition config _run_install /usr/lib/python3.9/site-packages/ironic_coreos_install.py:193
      2023-03-28 09:23:46.389 1 DEBUG ironic_coreos_install [-] coreos-installer: Modifying kernel arguments _run_install /usr/lib/python3.9/site-packages/ironic_coreos_install.py:193
      2023-03-28 09:23:46.822 1 DEBUG ironic_coreos_install [-] coreos-installer: Install complete. _run_install /usr/lib/python3.9/site-packages/ironic_coreos_install.py:193

       

      Attachments

        Activity

          People

            dhiggins@redhat.com Derek Higgins
            dhiggins@redhat.com Derek Higgins
            Michael Nguyen Michael Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: