-
Bug
-
Resolution: Done
-
Critical
-
None
-
4.16
-
No
-
3
-
NHE Sprint 256
-
1
-
False
-
(Sorry for the not-well crafted bug report. I don't understand well what goes wrong, who to assign to or which information to provide).
We have a tool for deploying openshift (https://github.com/bn222/cluster-deployment-automation).
it can successfully deploy for example "version: 4.14.0-nightly" or "version: 4.15.0-nightly". The tool uses assisted installer via aicli python library.
When deploying 4.16.0-nightly (or other 4.16 that I tried), the worker nodes on bare metal don't properly install.
In `aicli list events clustername` I see as last message:
> ost worker-246: updated status from added-to-existing-cluster to installing-pending-user-action (Expected the host to boot from disk, but it booted the installation image - please reboot and fix boot order to boot from disk PERC_H755_Front 0090e68f19968f0a2b005bf93e80e04e (sda, /dev/disk/by-id/wwn-0x6f4ee0803ef95b002b0a8f96198fe690))
Note that the boot entry `efibootmgr` output looks correct (you can also see it in the attached logs, grep for "Boot.*Red Hat"). Also, the system is inconfigured to boot from the disk, we use idrac/redfish to do a single boot into the CoreOS installer disk. Any subsequent (re)boot should boot the from disk again.
Also, entering the boot menu and selecting "Red Hat Linux" boot entry says that the entry cannot be booted. It would then fallback to boot the installer ISO again.
To me, it would seem as if the installed system is not bootable, which is the cause for the issue.
(note that the exact same setup works with previous OCP versions).
The failure happens every time, it's thus easy to reproduce.
- is blocked by
-
OCPBUGS-36398 4.16 Control Plane nodes fail to install with OCP 4.16
-
- Closed
-