Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-33124

After upgrading to 4.13 from 4.12 one of the worker node went into emergency mode.

XMLWordPrintable

    • Critical
    • No
    • 5
    • 254 - Integration & Delivery, 255 - Integration & Delivery
    • 2
    • False
    • Hide

      None

      Show
      None
    • Hide
      Previously, a bug in the `growpart` utility caused a LUKS device to become locked and unable to open. This prevented the system from booting and entering into an emergency mode. With this release, the call to the `growpart` utility is removed and the system successfully boots without issue. (link:https://issues.redhat.com/browse/OCPBUGS-33124[*OCPBUGS-33124*])
      Show
      Previously, a bug in the `growpart` utility caused a LUKS device to become locked and unable to open. This prevented the system from booting and entering into an emergency mode. With this release, the call to the `growpart` utility is removed and the system successfully boots without issue. (link: https://issues.redhat.com/browse/OCPBUGS-33124 [* OCPBUGS-33124 *])
    • Bug Fix
    • Done

      Description of problem:

      After upgrading the cluster from 4.12 to 4.13. Nodes getting booted into emergency mode. Due to error 
      `blockdev: cannot open /dev/dasda2: No such file or directory` 
      
      From the sosreport collected after successful boot we could see that there were following symlinks setup in /dev/disk/by-label:	
      
      
      	[sosreport]$ less sos_commands/block/ls_-lanR_.dev 
      	[...]
      	/dev/disk/by-label:
      	total 0
      	drwxr-xr-x. 2 0 0 100 Apr 24 10:09 .
      	drwxr-xr-x. 7 0 0 140 Apr 24 10:09 ..
      	lrwxrwxrwx. 1 0 0  12 Apr 24 10:09 boot -> ../../dasda1
      	lrwxrwxrwx. 1 0 0  12 Apr 24 10:09 crypt_rootfs -> ../../dasda2
      	lrwxrwxrwx. 1 0 0  10 Apr 24 10:09 root -> ../../dm-0                   <<----------
      
      
      The command outputs collected from emergency mode, during failed boot process, shows that   "root -> ../../dm-0" link was not setup in by-label directory. However /dev/dm-0 device was setup by the time boot process failed:
      
      Command outputs from emergency mode:
      
      
      	[Console logs]$ less 0200-worker-3-emergency-mode.txt 
      	[...]
      	11:56:41 ls -l /dev/disk/by-label/                                              
      	11:56:42  ¬?2004l 11:56:42 total 0                                              
      	11:56:42 lrwxrwxrwx 1 root root 12 Apr 26 08:11 boot -> ../../dasda1            
      	11:56:42 lrwxrwxrwx 1 root root 12 Apr 26 08:11 crypt_rootfs -> ../../dasda2    		<<---------- "root -> ../../dm-0" symlink is missing
      
      
      After multiple retries it gets booted successfully. 

      Version-Release number of selected component (if applicable):

      4.13.36

      How reproducible:

      NA

      Steps to Reproduce:

          1. Upgrade cluster to 4.13
          2. Check the master and worker node boot
          3. Observe the nodes if they booting in emergency mode and collect console logs.
          

      Actual results:

      Node went into emergency mode

      Expected results:

      Node should boot successfully without any issue.

      Additional info:

      Customer is using Zvm to VM provisioning.

              mapillai Madhu Pillai
              rhn-support-soujain Sourav Jain
              Michael Nguyen Michael Nguyen
              Votes:
              0 Vote for this issue
              Watchers:
              17 Start watching this issue

                Created:
                Updated:
                Resolved: