-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.18
-
None
-
Quality / Stability / Reliability
-
False
-
-
3
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
OCP 4.17 Single node subsequent reboot after initial install fails on klas hardware
Version-Release number of selected component (if applicable):
installer single node ocp cluster
How reproducible:
Install OCP 4.17 on klas hardware and reboot and its fails to boot everytime
Steps to Reproduce:
I am installing a single node OpenShift UPI cluster. After the initial install of the master node, subsequent reboots fail. These are the last logs before failure:
[ 237.048326] dracut-initqueue[764]: [ -e "/dev/disk/by-uuid/63ebc06d-5423-4ad4-80cc-9a4839651d83" ]
[ 236.884693] dracut-initqueue[764]: [ -e "/dev/disk/by-uuid/63ebc06d-5423-4ad4-80cc-9a4839651d83" ]
[ 237.050154] dracut-initqueue[764]: fi"
[ 236.886041] dracut-initqueue[764]: fi"
[ 237.050938] dracut-initqueue[764]: Warning: dracut-initqueue: starting timeout scripts
[ 236.887182] dracut-initqueue[764]: Warning: dracut-initqueue: starting timeout scripts
[ 237.052496] dracut-initqueue[764]: Warning: Could not boot.
[ 236.888549] dracut-initqueue[764]: Warning: Could not boot.
[ OK ] Stopped targ[ 237.096828] systemd[1]: Stopped target Subsequent (Not Ignition) boot complete.
et Subsequent (Not Ignition) boot complete.
[ 237.098240] systemd[1]: Stopped target Ignition Subsequent Boot Disk Setup.
[ OK ] Stopped target Ignition Subsequent Boot Disk Setup.
[ 237.105891] systemd[1]: Starting Dracut Emergency Shell...
Starting Dracut Emergency Shell...
Warning: /dev/disk/by-uuid/63ebc06d-5423-4ad4-80cc-9a4839651d83 does not exist
Indeed this uuid does not exist:
dracut:/# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
sr0 11:0 1 1024M 0 rom
nvme0n1 259:0 0 953.9G 0 disk
|-nvme0n1p1 259:1 0 1M 0 part
|-nvme0n1p2 259:2 0 127M 0 part
|-nvme0n1p3 259:3 0 384M 0 part
`-nvme0n1p4 259:4 0 953.4G 0 part
`-root 253:0 0 953.4G 0 crypt
dracut:/# ls -lha /dev/disk/by-uuid/
total 0
drwxr-xr-x 2 root root 120 Jun 10 21:13 .
drwxr-xr-x 9 root root 180 Jun 10 21:13 ..
lrwxrwxrwx 1 root root 15 Jun 10 21:13 63e4e615-6dbf-4da6-abf2-59e87c8a17d0 -> ../../nvme0n1p3
lrwxrwxrwx 1 root root 15 Jun 10 21:13 7B77-95E7 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root 15 Jun 10 21:13 89db4bea-646e-40b0-8692-e6142af87040 -> ../../nvme0n1p4
lrwxrwxrwx 1 root root 10 Jun 10 21:13 e8d40515-08a3-4a97-a2e5-618966f033cb -> ../../dm-0
This appears to be a regression in OpenShift 4.17; installing OpenShift 4.16 on the same hardware does not have this problem.
After reproducing this a second time, I found these logs:
Jun 11 17:58:01 localhost systemd[1]: Finished Ignition (disks).
Jun 11 17:58:01 localhost systemd[1]: Reached target Initrd Root Device.
Jun 11 17:58:02 localhost systemd[1]: Starting CoreOS Ignition Ensure Unique Boot Filesystem...
Jun 11 17:58:02 localhost systemd[1]: Finished CoreOS Ignition Ensure Unique Boot Filesystem.
Jun 11 17:58:02 localhost systemd[1]: Ignition OSTree: Regenerate Filesystem UUID (root) was skipped because of an unmet condition check (ConditionPathExists=!/run/ignition-ostree-transposefs).
Jun 11 17:58:02 localhost systemd[1]: Starting Ignition OSTree: Grow Root Filesystem...
Jun 11 17:58:02 localhost systemd[1]: Afterburn (Check In - from the initramfs) was skipped because of an unmet condition check (ConditionKernelCommandLine=ignition.platform.id=azure).
Jun 11 17:58:02 localhost systemd-journald[379]: Missed 44 kernel messages
Jun 11 17:58:02 localhost kernel: XFS (dm-0): Mounting V5 Filesystem 8893b769-b533-4a39-87a0-656c87d73569
Jun 11 17:58:02 localhost kernel: XFS (dm-0): Ending clean mount
Jun 11 17:58:04 localhost ignition-ostree-growfs[2515]: CHANGED: partition=4 start=1050624 old: size=6197248 end=7247871 new: size=1999358607 end=2000409230
Jun 11 17:58:05 localhost ignition-ostree-growfs[2620]: 0
Jun 11 17:58:07 localhost systemd-journald[379]: Missed 2 kernel messages
Jun 11 17:58:07 localhost kernel: async_tx: api initialized (async)
Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: meta-data=/dev/mapper/root isize=512 agcount=4, agsize=192640 blks
Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: = sectsz=512 attr=2, projid32bit=1
Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: = crc=1 finobt=1, sparse=1, rmapbt=0
Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: = reflink=1 bigtime=1 inobtcount=1 nrext64=0
Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: data = bsize=4096 blocks=770560, imaxpct=25
Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: = sunit=0 swidth=0 blks
Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: naming =version 2 bsize=4096 ascii-ci=0, ftype=1
Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: log =internal log bsize=4096 blocks=16384, version=2
Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: = sectsz=512 sunit=0 blks, lazy-count=1
Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: realtime =none extsz=4096 blocks=0, rtextents=0
Jun 11 17:58:08 localhost ignition-ostree-growfs[2881]: data blocks changed from 770560 to 249915729
Jun 11 17:58:08 localhost systemd-journald[379]: Missed 11 kernel messages
Jun 11 17:58:08 localhost kernel: XFS (dm-0): Unmounting Filesystem 8893b769-b533-4a39-87a0-656c87d73569
Jun 11 17:58:08 localhost systemd[1]: Finished Ignition OSTree: Grow Root Filesystem.
Jun 11 17:58:08 localhost systemd[1]: Starting Ignition OSTree: Autosave XFS Rootfs Partition...
Jun 11 17:58:08 localhost ignition-ostree-transposefs[2912]: autosave-xfs: /dev/disk/by-label/root agcount=1298 meets threshold=400
Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]: meta-data=/dev/disk/by-label/root isize=512 agcount=4, agsize=62478933 blks
Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]: = sectsz=512 attr=2, projid32bit=1
Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]: = crc=1 finobt=1, sparse=1, rmapbt=0
Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]: = reflink=1 bigtime=1 inobtcount=1 nrext64=0
Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]: data = bsize=4096 blocks=249915729, imaxpct=25
Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]: = sunit=0 swidth=0 blks
Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]: naming =version 2 bsize=4096 ascii-ci=0, ftype=1
Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]: log =internal log bsize=4096 blocks=122029, version=2
Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]: = sectsz=512 sunit=0 blks, lazy-count=1
Jun 11 17:58:09 localhost ignition-ostree-transposefs[2925]: realtime =none extsz=4096 blocks=0, rtextents=0
Jun 11 17:58:09 localhost systemd[1]: Finished Ignition OSTree: Autosave XFS Rootfs Partition.
Jun 11 17:58:09 localhost systemd[1]: Starting Ignition OSTree: Restore Partitions...
Jun 11 17:58:09 localhost ignition-ostree-transposefs[2939]: Restoring rootfs from RAM...
Jun 11 17:58:09 localhost ignition-ostree-transposefs[2939]: Mounting /dev/disk/by-label/root rw (/dev/dm-0) to /sysroot
Jun 11 17:58:09 localhost systemd-journald[379]: Missed 17 kernel messages
Jun 11 17:58:09 localhost kernel: XFS (dm-0): Mounting V5 Filesystem a01718cd-3990-4dfb-8b55-cb5f6124f077
Jun 11 17:58:09 localhost kernel: XFS (dm-0): Ending clean mount
Jun 11 17:58:13 localhost ignition-ostree-transposefs[2977]: changing security context of '/sysroot'
Jun 11 17:58:15 localhost systemd-journald[379]: Missed 1 kernel messages
Jun 11 17:58:15 localhost kernel: XFS (dm-0): Unmounting Filesystem a01718cd-3990-4dfb-8b55-cb5f6124f077
Jun 11 17:58:15 localhost systemd[1]: Finished Ignition OSTree: Restore Partitions.
Jun 11 17:58:15 localhost systemd[1]: Starting Determine root FS mount option flags...
Jun 11 17:58:15 localhost systemd[1]: Finished Determine root FS mount option flags.
Jun 11 17:58:15 localhost systemd[1]: Mounting /sysroot...
Jun 11 17:58:15 localhost systemd-journald[379]: Missed 4 kernel messages
Jun 11 17:58:15 localhost kernel: XFS (dm-0): Mounting V5 Filesystem a01718cd-3990-4dfb-8b55-cb5f6124f077
Jun 11 17:58:15 localhost kernel: XFS (dm-0): Ending clean mount
Jun 11 17:58:15 localhost kernel: XFS (dm-0): Quotacheck needed: Please wait.
Jun 11 17:58:15 localhost kernel: XFS (dm-0): Quotacheck: Done.
Jun 11 17:58:15 localhost systemd[1]: Mounted /sysroot.
Jun 11 17:58:15 localhost systemd[1]: Remount /sysroot read-write for Ignition was skipped because of an unmet condition check (ConditionPathIsReadWrite=!/sysroot).
Jun 11 17:58:15 localhost systemd[1]: Starting OSTree Prepare OS/...
Jun 11 17:58:15 localhost ostree-prepare-root[2997]: Resolved OSTree target to: /sysroot/ostree/deploy/rhcos/deploy/b3c808aa05843e2d623d2febe7b8d739868ec696ab5f338eef2a415e84509911.0
Jun 11 17:58:15 localhost ostree-prepare-root[2997]: Found legacy sysroot.readonly flag, not configured in ostree/prepare-root.conf
Jun 11 17:58:15 localhost ostree-prepare-root[2997]: sysroot.readonly configuration value: 1 (fs writable: 1)
Jun 11 17:58:15 localhost ostree-prepare-root[2997]: composefs: No image present
Jun 11 17:58:15 localhost ostree-prepare-root[2997]: Using legacy ostree bind mount for /
So I can see that at some point during ignition the uuid for the device changes. However, that change is not persisted in GRUB somehow.
Describe the impact to you or the business
Unable to install new OpenShift clusters - can cause us to miss SLAs for delivering installations.
In what environment are you experiencing this behavior?
Problem introduced in OpenShift 4.17, installing on a Klas VM4 https://www.klasgroup.com/products/voyager-vm-4-0/. Other hardware e.g Dell does not seem to experience this problem.
How frequently does this behavior occur? Does it occur repeatedly or at certain times?
Every time.
Actual results:
OCP SNO is not booting after initial install and reboot
Expected results:
OCP SNO should booting after initial install and reboot
Additional info:
- clones
-
OCPBUGS-57573 RHCOS SNO does not boot after install; wrong device uuid in GRUB
-
- Closed
-