Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-57573

RHCOS SNO does not boot after install; wrong device uuid in GRUB

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.17
    • RHCOS
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 3
    • None
    • None
    • None
    • None
    • None
    • Ready to Pick, CoreOS West - 276, CoreOS West - Sprint 277, CoreOS West - Sprint 278
    • 4
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

          OCP 4.17 Single node subsequent reboot after initial install fails on klas hardware

      Version-Release number of selected component (if applicable):

          installer single node ocp cluster

      How reproducible:

          Install OCP 4.17 on klas hardware and reboot and its fails to boot everytime

      Steps to Reproduce:

      I am installing a single node OpenShift UPI cluster. After the initial install of the master node, subsequent reboots fail. These are the last logs before failure:
      
      [  237.048326] dracut-initqueue[764]:     [ -e "/dev/disk/by-uuid/63ebc06d-5423-4ad4-80cc-9a4839651d83" ]
      [  236.884693] dracut-initqueue[764]:     [ -e "/dev/disk/by-uuid/63ebc06d-5423-4ad4-80cc-9a4839651d83" ]
      [  237.050154] dracut-initqueue[764]: fi"
      [  236.886041] dracut-initqueue[764]: fi"
      [  237.050938] dracut-initqueue[764]: Warning: dracut-initqueue: starting timeout scripts
      [  236.887182] dracut-initqueue[764]: Warning: dracut-initqueue: starting timeout scripts
      [  237.052496] dracut-initqueue[764]: Warning: Could not boot.
      [  236.888549] dracut-initqueue[764]: Warning: Could not boot.
      [  OK  ] Stopped targ[  237.096828] systemd[1]: Stopped target Subsequent (Not Ignition) boot complete.
      et Subsequent (Not Ignition) boot complete.
      [  237.098240] systemd[1]: Stopped target Ignition Subsequent Boot Disk Setup.
      [  OK  ] Stopped target Ignition Subsequent Boot Disk Setup.
      [  237.105891] systemd[1]: Starting Dracut Emergency Shell...
               Starting Dracut Emergency Shell...
      Warning: /dev/disk/by-uuid/63ebc06d-5423-4ad4-80cc-9a4839651d83 does not exist
      
      Indeed this uuid does not exist:
      
      dracut:/# lsblk
      NAME        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINTS
      sr0          11:0    1  1024M  0 rom
      nvme0n1     259:0    0 953.9G  0 disk
      |-nvme0n1p1 259:1    0     1M  0 part
      |-nvme0n1p2 259:2    0   127M  0 part
      |-nvme0n1p3 259:3    0   384M  0 part
      `-nvme0n1p4 259:4    0 953.4G  0 part
        `-root    253:0    0 953.4G  0 crypt
      dracut:/# ls -lha /dev/disk/by-uuid/
      total 0
      drwxr-xr-x 2 root root 120 Jun 10 21:13 .
      drwxr-xr-x 9 root root 180 Jun 10 21:13 ..
      lrwxrwxrwx 1 root root  15 Jun 10 21:13 63e4e615-6dbf-4da6-abf2-59e87c8a17d0 -> ../../nvme0n1p3
      lrwxrwxrwx 1 root root  15 Jun 10 21:13 7B77-95E7 -> ../../nvme0n1p2
      lrwxrwxrwx 1 root root  15 Jun 10 21:13 89db4bea-646e-40b0-8692-e6142af87040 -> ../../nvme0n1p4
      lrwxrwxrwx 1 root root  10 Jun 10 21:13 e8d40515-08a3-4a97-a2e5-618966f033cb -> ../../dm-0
      
      This appears to be a regression in OpenShift 4.17; installing OpenShift 4.16 on the same hardware does not have this problem.
      
      After reproducing this a second time, I found these logs:
      
      Jun 11 17:58:01 localhost systemd[1]: Finished Ignition (disks).
      Jun 11 17:58:01 localhost systemd[1]: Reached target Initrd Root Device.
      Jun 11 17:58:02 localhost systemd[1]: Starting CoreOS Ignition Ensure Unique Boot Filesystem...
      Jun 11 17:58:02 localhost systemd[1]: Finished CoreOS Ignition Ensure Unique Boot Filesystem.
      Jun 11 17:58:02 localhost systemd[1]: Ignition OSTree: Regenerate Filesystem UUID (root) was skipped because of an unmet condition check (ConditionPathExists=!/run/ignition-ostree-transposefs).
      Jun 11 17:58:02 localhost systemd[1]: Starting Ignition OSTree: Grow Root Filesystem...
      Jun 11 17:58:02 localhost systemd[1]: Afterburn (Check In - from the initramfs) was skipped because of an unmet condition check (ConditionKernelCommandLine=ignition.platform.id=azure).
      Jun 11 17:58:02 localhost systemd-journald[379]: Missed 44 kernel messages
      Jun 11 17:58:02 localhost kernel: XFS (dm-0): Mounting V5 Filesystem 8893b769-b533-4a39-87a0-656c87d73569
      Jun 11 17:58:02 localhost kernel: XFS (dm-0): Ending clean mount
      Jun 11 17:58:04 localhost ignition-ostree-growfs[2515]: CHANGED: partition=4 start=1050624 old: size=6197248 end=7247871 new: size=1999358607 end=2000409230
      Jun 11 17:58:05 localhost ignition-ostree-growfs[2620]: 0
      Jun 11 17:58:07 localhost systemd-journald[379]: Missed 2 kernel messages
      Jun 11 17:58:07 localhost kernel: async_tx: api initialized (async)
      Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: meta-data=/dev/mapper/root       isize=512    agcount=4, agsize=192640 blks
      Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]:          =                       sectsz=512   attr=2, projid32bit=1
      Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]:          =                       crc=1        finobt=1, sparse=1, rmapbt=0
      Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]:          =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
      Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: data     =                       bsize=4096   blocks=770560, imaxpct=25
      Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]:          =                       sunit=0      swidth=0 blks
      Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
      Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: log      =internal log           bsize=4096   blocks=16384, version=2
      Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]:          =                       sectsz=512   sunit=0 blks, lazy-count=1
      Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: realtime =none                   extsz=4096   blocks=0, rtextents=0
      Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: data blocks changed from 770560 to 249915729
      Jun 11 17:58:08 localhost systemd-journald[379]: Missed 11 kernel messages
      Jun 11 17:58:08 localhost kernel: XFS (dm-0): Unmounting Filesystem 8893b769-b533-4a39-87a0-656c87d73569
      Jun 11 17:58:08 localhost systemd[1]: Finished Ignition OSTree: Grow Root Filesystem.
      Jun 11 17:58:08 localhost systemd[1]: Starting Ignition OSTree: Autosave XFS Rootfs Partition...
      Jun 11 17:58:08 localhost ignition-ostree-transposefs[2912]: autosave-xfs: /dev/disk/by-label/root agcount=1298 meets threshold=400
      Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]: meta-data=/dev/disk/by-label/root isize=512    agcount=4, agsize=62478933 blks
      Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]:          =                       sectsz=512   attr=2, projid32bit=1
      Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]:          =                       crc=1        finobt=1, sparse=1, rmapbt=0
      Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]:          =                       reflink=1    bigtime=1 inobtcount=1 nrext64=0
      Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]: data     =                       bsize=4096   blocks=249915729, imaxpct=25
      Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]:          =                       sunit=0      swidth=0 blks
      Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]: naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
      Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]: log      =internal log           bsize=4096   blocks=122029, version=2
      Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]:          =                       sectsz=512   sunit=0 blks, lazy-count=1
      Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]: realtime =none                   extsz=4096   blocks=0, rtextents=0
      Jun 11 17:58:09 localhost systemd[1]: Finished Ignition OSTree: Autosave XFS Rootfs Partition.
      Jun 11 17:58:09 localhost systemd[1]: Starting Ignition OSTree: Restore Partitions...
      Jun 11 17:58:09 localhost ignition-ostree-transposefs[2939]: Restoring rootfs from RAM...
      Jun 11 17:58:09 localhost ignition-ostree-transposefs[2939]: Mounting /dev/disk/by-label/root rw (/dev/dm-0) to /sysroot
      Jun 11 17:58:09 localhost systemd-journald[379]: Missed 17 kernel messages
      Jun 11 17:58:09 localhost kernel: XFS (dm-0): Mounting V5 Filesystem a01718cd-3990-4dfb-8b55-cb5f6124f077
      Jun 11 17:58:09 localhost kernel: XFS (dm-0): Ending clean mount
      Jun 11 17:58:13 localhost ignition-ostree-transposefs[2977]: changing security context of '/sysroot'
      Jun 11 17:58:15 localhost systemd-journald[379]: Missed 1 kernel messages
      Jun 11 17:58:15 localhost kernel: XFS (dm-0): Unmounting Filesystem a01718cd-3990-4dfb-8b55-cb5f6124f077
      Jun 11 17:58:15 localhost systemd[1]: Finished Ignition OSTree: Restore Partitions.
      Jun 11 17:58:15 localhost systemd[1]: Starting Determine root FS mount option flags...
      Jun 11 17:58:15 localhost systemd[1]: Finished Determine root FS mount option flags.
      Jun 11 17:58:15 localhost systemd[1]: Mounting /sysroot...
      Jun 11 17:58:15 localhost systemd-journald[379]: Missed 4 kernel messages
      Jun 11 17:58:15 localhost kernel: XFS (dm-0): Mounting V5 Filesystem a01718cd-3990-4dfb-8b55-cb5f6124f077
      Jun 11 17:58:15 localhost kernel: XFS (dm-0): Ending clean mount
      Jun 11 17:58:15 localhost kernel: XFS (dm-0): Quotacheck needed: Please wait.
      Jun 11 17:58:15 localhost kernel: XFS (dm-0): Quotacheck: Done.
      Jun 11 17:58:15 localhost systemd[1]: Mounted /sysroot.
      Jun 11 17:58:15 localhost systemd[1]: Remount /sysroot read-write for Ignition was skipped because of an unmet condition check (ConditionPathIsReadWrite=!/sysroot).
      Jun 11 17:58:15 localhost systemd[1]: Starting OSTree Prepare OS/...
      Jun 11 17:58:15 localhost ostree-prepare-root[2997]: Resolved OSTree target to: /sysroot/ostree/deploy/rhcos/deploy/b3c808aa05843e2d623d2febe7b8d739868ec696ab5f338eef2a415e84509911.0
      Jun 11 17:58:15 localhost ostree-prepare-root[2997]: Found legacy sysroot.readonly flag, not configured in ostree/prepare-root.conf
      Jun 11 17:58:15 localhost ostree-prepare-root[2997]: sysroot.readonly configuration value: 1 (fs writable: 1)
      Jun 11 17:58:15 localhost ostree-prepare-root[2997]: composefs: No image present
      Jun 11 17:58:15 localhost ostree-prepare-root[2997]: Using legacy ostree bind mount for /
      
      So I can see that at some point during ignition the uuid for the device changes. However, that change is not persisted in GRUB somehow.
      
      Describe the impact to you or the business
      Unable to install new OpenShift clusters - can cause us to miss SLAs for delivering installations.
      
      In what environment are you experiencing this behavior?
      Problem introduced in OpenShift 4.17, installing on a Klas VM4 https://www.klasgroup.com/products/voyager-vm-4-0/. Other hardware e.g Dell does not seem to experience this problem.
      
      How frequently does this behavior occur? Does it occur repeatedly or at certain times?
      Every time.

      Actual results:

         OCP SNO is not booting after initial install and reboot

      Expected results:

         OCP SNO should booting after initial install and reboot
       

      Additional info:

          

              rh-ee-ydesouza Yasmin de Souza
              rhn-support-nchoudhu Novonil Choudhuri
              None
              None
              Neil Hamza Neil Hamza
              None
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated: