-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.17
-
None
-
Quality / Stability / Reliability
-
False
-
-
3
-
None
-
None
-
None
-
None
-
None
-
Ready to Pick, CoreOS West - 276, CoreOS West - Sprint 277, CoreOS West - Sprint 278
-
4
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
OCP 4.17 Single node subsequent reboot after initial install fails on klas hardware
Version-Release number of selected component (if applicable):
installer single node ocp cluster
How reproducible:
Install OCP 4.17 on klas hardware and reboot and its fails to boot everytime
Steps to Reproduce:
I am installing a single node OpenShift UPI cluster. After the initial install of the master node, subsequent reboots fail. These are the last logs before failure: [ 237.048326] dracut-initqueue[764]: [ -e "/dev/disk/by-uuid/63ebc06d-5423-4ad4-80cc-9a4839651d83" ] [ 236.884693] dracut-initqueue[764]: [ -e "/dev/disk/by-uuid/63ebc06d-5423-4ad4-80cc-9a4839651d83" ] [ 237.050154] dracut-initqueue[764]: fi" [ 236.886041] dracut-initqueue[764]: fi" [ 237.050938] dracut-initqueue[764]: Warning: dracut-initqueue: starting timeout scripts [ 236.887182] dracut-initqueue[764]: Warning: dracut-initqueue: starting timeout scripts [ 237.052496] dracut-initqueue[764]: Warning: Could not boot. [ 236.888549] dracut-initqueue[764]: Warning: Could not boot. [ OK ] Stopped targ[ 237.096828] systemd[1]: Stopped target Subsequent (Not Ignition) boot complete. et Subsequent (Not Ignition) boot complete. [ 237.098240] systemd[1]: Stopped target Ignition Subsequent Boot Disk Setup. [ OK ] Stopped target Ignition Subsequent Boot Disk Setup. [ 237.105891] systemd[1]: Starting Dracut Emergency Shell... Starting Dracut Emergency Shell... Warning: /dev/disk/by-uuid/63ebc06d-5423-4ad4-80cc-9a4839651d83 does not exist Indeed this uuid does not exist: dracut:/# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS sr0 11:0 1 1024M 0 rom nvme0n1 259:0 0 953.9G 0 disk |-nvme0n1p1 259:1 0 1M 0 part |-nvme0n1p2 259:2 0 127M 0 part |-nvme0n1p3 259:3 0 384M 0 part `-nvme0n1p4 259:4 0 953.4G 0 part `-root 253:0 0 953.4G 0 crypt dracut:/# ls -lha /dev/disk/by-uuid/ total 0 drwxr-xr-x 2 root root 120 Jun 10 21:13 . drwxr-xr-x 9 root root 180 Jun 10 21:13 .. lrwxrwxrwx 1 root root 15 Jun 10 21:13 63e4e615-6dbf-4da6-abf2-59e87c8a17d0 -> ../../nvme0n1p3 lrwxrwxrwx 1 root root 15 Jun 10 21:13 7B77-95E7 -> ../../nvme0n1p2 lrwxrwxrwx 1 root root 15 Jun 10 21:13 89db4bea-646e-40b0-8692-e6142af87040 -> ../../nvme0n1p4 lrwxrwxrwx 1 root root 10 Jun 10 21:13 e8d40515-08a3-4a97-a2e5-618966f033cb -> ../../dm-0 This appears to be a regression in OpenShift 4.17; installing OpenShift 4.16 on the same hardware does not have this problem. After reproducing this a second time, I found these logs: Jun 11 17:58:01 localhost systemd[1]: Finished Ignition (disks). Jun 11 17:58:01 localhost systemd[1]: Reached target Initrd Root Device. Jun 11 17:58:02 localhost systemd[1]: Starting CoreOS Ignition Ensure Unique Boot Filesystem... Jun 11 17:58:02 localhost systemd[1]: Finished CoreOS Ignition Ensure Unique Boot Filesystem. Jun 11 17:58:02 localhost systemd[1]: Ignition OSTree: Regenerate Filesystem UUID (root) was skipped because of an unmet condition check (ConditionPathExists=!/run/ignition-ostree-transposefs). Jun 11 17:58:02 localhost systemd[1]: Starting Ignition OSTree: Grow Root Filesystem... Jun 11 17:58:02 localhost systemd[1]: Afterburn (Check In - from the initramfs) was skipped because of an unmet condition check (ConditionKernelCommandLine=ignition.platform.id=azure). Jun 11 17:58:02 localhost systemd-journald[379]: Missed 44 kernel messages Jun 11 17:58:02 localhost kernel: XFS (dm-0): Mounting V5 Filesystem 8893b769-b533-4a39-87a0-656c87d73569 Jun 11 17:58:02 localhost kernel: XFS (dm-0): Ending clean mount Jun 11 17:58:04 localhost ignition-ostree-growfs[2515]: CHANGED: partition=4 start=1050624 old: size=6197248 end=7247871 new: size=1999358607 end=2000409230 Jun 11 17:58:05 localhost ignition-ostree-growfs[2620]: 0 Jun 11 17:58:07 localhost systemd-journald[379]: Missed 2 kernel messages Jun 11 17:58:07 localhost kernel: async_tx: api initialized (async) Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: meta-data=/dev/mapper/root isize=512 agcount=4, agsize=192640 blks Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: = sectsz=512 attr=2, projid32bit=1 Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: = crc=1 finobt=1, sparse=1, rmapbt=0 Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: = reflink=1 bigtime=1 inobtcount=1 nrext64=0 Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: data = bsize=4096 blocks=770560, imaxpct=25 Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: = sunit=0 swidth=0 blks Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: naming =version 2 bsize=4096 ascii-ci=0, ftype=1 Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: log =internal log bsize=4096 blocks=16384, version=2 Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: = sectsz=512 sunit=0 blks, lazy-count=1 Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: realtime =none extsz=4096 blocks=0, rtextents=0 Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: data blocks changed from 770560 to 249915729 Jun 11 17:58:08 localhost systemd-journald[379]: Missed 11 kernel messages Jun 11 17:58:08 localhost kernel: XFS (dm-0): Unmounting Filesystem 8893b769-b533-4a39-87a0-656c87d73569 Jun 11 17:58:08 localhost systemd[1]: Finished Ignition OSTree: Grow Root Filesystem. Jun 11 17:58:08 localhost systemd[1]: Starting Ignition OSTree: Autosave XFS Rootfs Partition... Jun 11 17:58:08 localhost ignition-ostree-transposefs[2912]: autosave-xfs: /dev/disk/by-label/root agcount=1298 meets threshold=400 Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]: meta-data=/dev/disk/by-label/root isize=512 agcount=4, agsize=62478933 blks Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]: = sectsz=512 attr=2, projid32bit=1 Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]: = crc=1 finobt=1, sparse=1, rmapbt=0 Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]: = reflink=1 bigtime=1 inobtcount=1 nrext64=0 Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]: data = bsize=4096 blocks=249915729, imaxpct=25 Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]: = sunit=0 swidth=0 blks Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]: naming =version 2 bsize=4096 ascii-ci=0, ftype=1 Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]: log =internal log bsize=4096 blocks=122029, version=2 Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]: = sectsz=512 sunit=0 blks, lazy-count=1 Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]: realtime =none extsz=4096 blocks=0, rtextents=0 Jun 11 17:58:09 localhost systemd[1]: Finished Ignition OSTree: Autosave XFS Rootfs Partition. Jun 11 17:58:09 localhost systemd[1]: Starting Ignition OSTree: Restore Partitions... Jun 11 17:58:09 localhost ignition-ostree-transposefs[2939]: Restoring rootfs from RAM... Jun 11 17:58:09 localhost ignition-ostree-transposefs[2939]: Mounting /dev/disk/by-label/root rw (/dev/dm-0) to /sysroot Jun 11 17:58:09 localhost systemd-journald[379]: Missed 17 kernel messages Jun 11 17:58:09 localhost kernel: XFS (dm-0): Mounting V5 Filesystem a01718cd-3990-4dfb-8b55-cb5f6124f077 Jun 11 17:58:09 localhost kernel: XFS (dm-0): Ending clean mount Jun 11 17:58:13 localhost ignition-ostree-transposefs[2977]: changing security context of '/sysroot' Jun 11 17:58:15 localhost systemd-journald[379]: Missed 1 kernel messages Jun 11 17:58:15 localhost kernel: XFS (dm-0): Unmounting Filesystem a01718cd-3990-4dfb-8b55-cb5f6124f077 Jun 11 17:58:15 localhost systemd[1]: Finished Ignition OSTree: Restore Partitions. Jun 11 17:58:15 localhost systemd[1]: Starting Determine root FS mount option flags... Jun 11 17:58:15 localhost systemd[1]: Finished Determine root FS mount option flags. Jun 11 17:58:15 localhost systemd[1]: Mounting /sysroot... Jun 11 17:58:15 localhost systemd-journald[379]: Missed 4 kernel messages Jun 11 17:58:15 localhost kernel: XFS (dm-0): Mounting V5 Filesystem a01718cd-3990-4dfb-8b55-cb5f6124f077 Jun 11 17:58:15 localhost kernel: XFS (dm-0): Ending clean mount Jun 11 17:58:15 localhost kernel: XFS (dm-0): Quotacheck needed: Please wait. Jun 11 17:58:15 localhost kernel: XFS (dm-0): Quotacheck: Done. Jun 11 17:58:15 localhost systemd[1]: Mounted /sysroot. Jun 11 17:58:15 localhost systemd[1]: Remount /sysroot read-write for Ignition was skipped because of an unmet condition check (ConditionPathIsReadWrite=!/sysroot). Jun 11 17:58:15 localhost systemd[1]: Starting OSTree Prepare OS/... Jun 11 17:58:15 localhost ostree-prepare-root[2997]: Resolved OSTree target to: /sysroot/ostree/deploy/rhcos/deploy/b3c808aa05843e2d623d2febe7b8d739868ec696ab5f338eef2a415e84509911.0 Jun 11 17:58:15 localhost ostree-prepare-root[2997]: Found legacy sysroot.readonly flag, not configured in ostree/prepare-root.conf Jun 11 17:58:15 localhost ostree-prepare-root[2997]: sysroot.readonly configuration value: 1 (fs writable: 1) Jun 11 17:58:15 localhost ostree-prepare-root[2997]: composefs: No image present Jun 11 17:58:15 localhost ostree-prepare-root[2997]: Using legacy ostree bind mount for / So I can see that at some point during ignition the uuid for the device changes. However, that change is not persisted in GRUB somehow. Describe the impact to you or the business Unable to install new OpenShift clusters - can cause us to miss SLAs for delivering installations. In what environment are you experiencing this behavior? Problem introduced in OpenShift 4.17, installing on a Klas VM4 https://www.klasgroup.com/products/voyager-vm-4-0/. Other hardware e.g Dell does not seem to experience this problem. How frequently does this behavior occur? Does it occur repeatedly or at certain times? Every time.
Actual results:
OCP SNO is not booting after initial install and reboot
Expected results:
OCP SNO should booting after initial install and reboot
Additional info: