Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-35988

After upgrading to 4.13 from 4.12 one of the worker node went into emergency mode.

XMLWordPrintable

    • Critical
    • No
    • 255 - Integration & Delivery
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, the GrowPart tool locked a device. This impacted Linux Unified Key Setup-on-disk-format (LUKS) devices from being opened and caused the operating system to boot in emergency mode. With this release, the call to the GrowPart tool is removed, so that LUKS devices are not unintentionally locked and the operating system can successfully boot. (link:https://issues.redhat.com/browse/OCPBUGS-35988[*OCPBUGS-35988*])
      Show
      * Previously, the GrowPart tool locked a device. This impacted Linux Unified Key Setup-on-disk-format (LUKS) devices from being opened and caused the operating system to boot in emergency mode. With this release, the call to the GrowPart tool is removed, so that LUKS devices are not unintentionally locked and the operating system can successfully boot. (link: https://issues.redhat.com/browse/OCPBUGS-35988 [* OCPBUGS-35988 *])
    • Bug Fix
    • Done

      This is a clone of issue OCPBUGS-33124. The following is the description of the original issue:

      Description of problem:

      After upgrading the cluster from 4.12 to 4.13. Nodes getting booted into emergency mode. Due to error 
      `blockdev: cannot open /dev/dasda2: No such file or directory` 
      
      From the sosreport collected after successful boot we could see that there were following symlinks setup in /dev/disk/by-label:	
      
      
      	[sosreport]$ less sos_commands/block/ls_-lanR_.dev 
      	[...]
      	/dev/disk/by-label:
      	total 0
      	drwxr-xr-x. 2 0 0 100 Apr 24 10:09 .
      	drwxr-xr-x. 7 0 0 140 Apr 24 10:09 ..
      	lrwxrwxrwx. 1 0 0  12 Apr 24 10:09 boot -> ../../dasda1
      	lrwxrwxrwx. 1 0 0  12 Apr 24 10:09 crypt_rootfs -> ../../dasda2
      	lrwxrwxrwx. 1 0 0  10 Apr 24 10:09 root -> ../../dm-0                   <<----------
      
      
      The command outputs collected from emergency mode, during failed boot process, shows that   "root -> ../../dm-0" link was not setup in by-label directory. However /dev/dm-0 device was setup by the time boot process failed:
      
      Command outputs from emergency mode:
      
      
      	[Console logs]$ less 0200-worker-3-emergency-mode.txt 
      	[...]
      	11:56:41 ls -l /dev/disk/by-label/                                              
      	11:56:42  ¬?2004l 11:56:42 total 0                                              
      	11:56:42 lrwxrwxrwx 1 root root 12 Apr 26 08:11 boot -> ../../dasda1            
      	11:56:42 lrwxrwxrwx 1 root root 12 Apr 26 08:11 crypt_rootfs -> ../../dasda2    		<<---------- "root -> ../../dm-0" symlink is missing
      
      
      After multiple retries it gets booted successfully. 

      Version-Release number of selected component (if applicable):

      4.13.36

      How reproducible:

      NA

      Steps to Reproduce:

          1. Upgrade cluster to 4.13
          2. Check the master and worker node boot
          3. Observe the nodes if they booting in emergency mode and collect console logs.
          

      Actual results:

      Node went into emergency mode

      Expected results:

      Node should boot successfully without any issue.

      Additional info:

      Customer is using Zvm to VM provisioning.

              mapillai Madhu Pillai
              openshift-crt-jira-prow OpenShift Prow Bot
              Michael Nguyen Michael Nguyen
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: