Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-7810

Disk label timeout error while installing cluster in Baremetal UPI

XMLWordPrintable

    • Moderate
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      When booting using pxe to install a  cluster customer is facing timeout error.
      
      dev-disk-by\x2dlabel-boot.device: Job dev-disk-by\x2dlabel-boot.device/start timed out
      
      RHEL 8 is able to boot up, but not RHCOS. Another observation is, particuarly AMD chipset is getting this error and Intel works just fine. issue occured on openshift versions 4.11.24 and 4.11.25, but 4.12.1 works. Method of installation is Baremetal UPI.
      
      Hardware used:
      Dell PowerEdge R7625
      
      

      Version-Release number of selected component (if applicable):

      openshift-4.11.25

      How reproducible:

      100%

      Steps to Reproduce:

      1.Create a cluster.
      2.Add the worker node with hardware: Dell PowerEdge R7625 using PXE boot[1].
      [1]https://docs.openshift.com/container-platform/4.11/post_installation_configuration/node-tasks.html#machine-user-infra-machines-pxe_post-install-node-tasks
      3.Using UEFI and has grub.conf in place.
      

      Actual results:

      1.From the seial console logs
      
      [   10.221325] rtc_cmos 00:02: setting system clock to 2023-02-21 05:09:11 UTC (1676956151)
      [   10.242096] Freeing unused decrypted memory: 2036K
      [   10.257829] Freeing unused kernel image (initmem) memory: 2524K
      [   10.279012] Write protecting the kernel read-only data: 24576k
      [   10.294811] Freeing unused kernel image (text/rodata gap) memory: 2016K
      [   10.310951] Freeing unused kernel image (rodata/data gap) memory: 1948K
      [   10.371614] systemd[1]: systemd 239 (239-58.el8_6.7) running in system mode. (+PAM +AUDIT +SELINUX +IMA -APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD +IDN2 -IDN +PCRE2 default-hierarchy=legacy)
      [   10.440028] systemd[1]: Detected architecture x86-64.
      [   10.458457] systemd[1]: Running in initial RAM disk.
      [   10.887018] usb 3-1.4.1: new high-speed USB device number 7 using xhci_hcd
      [   10.995241] usb 3-1.4.1: New USB device found, idVendor=413c, idProduct=0001, bcdDevice= 0.00
      [   11.016067] usb 3-1.4.1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
      [   11.032306] usb 3-1.4.1: Product: iDRAC VIRTUAL MEDIA
      [   11.046126] usb 3-1.4.1: Manufacturer: DELL INC.
      [   11.060204] usb 3-1.4.1: SerialNumber: 1028  123456
      [   40.927097] systemd[1]: No hostname configured.
      [   40.940278] systemd[1]: Set hostname to <localhost>.
      [   40.953777] systemd[1]: Initializing machine ID from random generator.
      [   41.018510] systemd[1]: Reached target Swap.
      [   71.135169] systemd[1]: Reached target Slices.
      [  101.343097] systemd[1]: Reached target Timers.
      [  131.551209] systemd[1]: Started Forward Password Requests to Clevis Directory Watch.
      [  161.759141] systemd[1]: Started Dispatch Password Requests to Console Directory Watch.
      [  191.967182] systemd[1]: Reached target Local Encrypted Volumes.
      [  222.175131] systemd[1]: Reached target Paths.
      [  252.383217] systemd[1]: Listening on Journal Socket.
      [  282.592153] systemd[1]: Starting Afterburn Initrd Setup Network Kernel Arguments...
      [  312.799661] systemd[1]: Started Memstrack Anylazing Service.
      [  343.007850] systemd[1]: Starting Setup Virtual Console...
      [  373.215250] systemd[1]: Listening on Journal Socket (/dev/log).
      [  403.423269] systemd[1]: Listening on Open-iSCSI iscsiuio Socket.
      [  433.631822] systemd[1]: Starting Journal Service...
      [  433.734198] afterburn[1370]: Feb 21 05:13:43.880 WARN kernel cmdline already specifies network arguments, skipping
      [  463.839804] systemd[1]: Starting Create list of required static device nodes for the current kernel...
      [  494.047783] systemd[1]: Starting Load Kernel Modules...
      [  494.082860] fuse: init (API version 7.33)
      [  494.095202] systemd-modules-load[1392]: Inserted module 'fuse'
      [  494.109038] IPMI message handler: version 39.2
      [  494.109105] systemd-modules-load[1392]: Module 'msr' is builtin
      [  494.135001] ipmi device interface
      [  494.145927] systemd-modules-load[1392]: Inserted module 'ipmi_devintf'
      [  524.255310] systemd[1]: Listening on Open-iSCSI iscsid Socket.
      [  554.463767] systemd[1]: Starting iSCSI UserSpace I/O driver...
      [  554.497838] Loading iSCSI transport class v2.0-870.
      [  584.671229] systemd[1]: Listening on udev Kernel Socket.
      [  614.879822] systemd[1]: Starting CoreOS: Touch /run/agetty.reload...
      [  645.087294] systemd[1]: Listening on udev Control Socket.
      [  675.295176] systemd[1]: Reached target Sockets.
      [  705.503230] systemd[1]: Started CoreOS Tear Down Initramfs.
      [  735.713424] systemd[1]: Started Journal Service.
      [  765.919626] systemd[1]: Started iSCSI UserSpace I/O driver.
      [  796.127802] systemd[1]: Started Afterburn Initrd Setup Network Kernel Arguments.
      [  826.335637] systemd[1]: memstrack.service: Succeeded.
      [  826.348168] systemd[1]: systemd-vconsole-setup.service: Succeeded.
      [  826.362676] systemd[1]: Started Setup Virtual Console.
      [  856.543651] systemd[1]: Started Create list of required static device nodes for the current kernel.
      [  886.751729] systemd[1]: Started Load Kernel Modules.
      [  916.959673] systemd[1]: Started CoreOS: Touch /run/agetty.reload.
      [  947.167797] systemd[1]: dev-disk-by\x2dlabel-boot.device: Job dev-disk-by\x2dlabel-boot.device/start timed out.
      [  947.184991] systemd[1]: Timed out waiting for device dev-disk-by\x2dlabel-boot.device.
      [  977.375364] systemd[1]: Dependency failed for Check for FIPS mode.
      [ 1007.583395] systemd[1]: Dependency failed for Ignition Boot Disk Setup.
      [ 1037.791342] systemd[1]: Dependency failed for Ignition Complete.
      [ 1067.999402] systemd[1]: Dependency failed for Initrd Default Target.
      [ 1098.207363] systemd[1]: initrd.target: Job initrd.target/start failed with result 'dependency'.
      [ 1098.224577] systemd[1]: initrd.target: Triggering OnFailure= dependencies.
      [ 1098.238601] systemd[1]: ignition-complete.target: Job ignition-complete.target/start failed with result 'dependency'.
      [ 1098.256451] systemd[1]: ignition-diskful.target: Job ignition-diskful.target/start failed with result 'dependency'.
      [ 1098.274589] systemd[1]: rhcos-fips.service: Job rhcos-fips.service/start failed with result 'dependency'.
      [ 1098.292007] systemd[1]: rhcos-fips.service: Triggering OnFailure= dependencies.
      [ 1098.306776] systemd[1]: dev-disk-by\x2dlabel-boot.device: Job dev-disk-by\x2dlabel-boot.device/start failed with result 'timeout'.
      [ 1098.334381] systemd[1]: Stopped target Local Encrypted Volumes.
      [ 1128.415322] systemd[1]: Stopped target Slices.
      [ 1158.623452] systemd[1]: clevis-luks-askpass.path: Succeeded.
      [ 1158.638062] systemd[1]: Stopped Forward Password Requests to Clevis Directory Watch.
      [ 1188.831342] systemd[1]: Stopped target Paths.
      [ 1219.039387] systemd[1]: systemd-ask-password-console.path: Succeeded.
      [ 1219.053782] systemd[1]: Stopped Dispatch Password Requests to Console Directory Watch.
      [ 1249.247355] systemd[1]: Stopped target Timers.
      [ 1279.455362] systemd[1]: Stopped target Swap.
      
      
      2. PXE configuration
      # cat grub.cfg-01-b4-83-51-02-73-9e
      
      set timeout=3
      menuentry 'etcd16g-0' --class fedora --class gnu-linux --class gnu --class os {
        linuxefi rhcos/4.11/rhcos-live-kernel-x86_64 nomodeset rd.neednet=1 coreos.inst.insecure coreos.live.rootfs_url=http://192.168.x.x:8080/rhcos/4.11/rhcos-live-rootfs.x86_64.img coreos.inst=yes coreos.inst.install_dev=/dev/nvme0n1 coreos.inst.ignition_url=http://192.168.x.x:8080/ignition/master.ign ip=192.168.x.x::192.168.x.x:255.255.255.0:etcd16g-0.openshift411.dcws.lab:bond0:none bond=bond0:ens3f0,ens3f1:mode=active-backup nameserver=192.168.x.x initcall_debug log_buf_len=10M systemd.log_level=debug
        initrdefi rhcos/4.11/rhcos-live-initramfs.x86_64.img
      }

      Expected results:

      Nodes should be added successfully to the cluster

      Additional info:

      - We tried to gather debug logs but it just hungs for hours at dracut scripts, tried to gather with nothing supplied to boot for verbosity where it atleast reaches to the error.

            rhn-coreos-bgilbert Benjamin Gilbert (Inactive)
            rhn-support-cbj Chandrasekhar Chandrasekhar (Inactive)
            Michael Nguyen Michael Nguyen
            Chandrasekhar Chandrasekhar (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: