Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-35973

After upgrading to 4.13 from 4.12 one of the worker node went into emergency mode.

XMLWordPrintable

    • No
    • 255 - Integration & Delivery
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      Cause: A bug in growpart caused the device to be locked , which prevented the LUKS device being opened.

      Consequence: The system was unable to boot and landed in emergency mode.

      Fix: The call to the growpart is removed, as it is not necessary.

      Result: The system no longer breaks and successfully boots.
      Show
      Cause: A bug in growpart caused the device to be locked , which prevented the LUKS device being opened. Consequence: The system was unable to boot and landed in emergency mode. Fix: The call to the growpart is removed, as it is not necessary. Result: The system no longer breaks and successfully boots.
    • Bug Fix
    • In Progress

      This is a clone of issue OCPBUGS-33124. The following is the description of the original issue:

      Description of problem:

      After upgrading the cluster from 4.12 to 4.13. Nodes getting booted into emergency mode. Due to error 
      `blockdev: cannot open /dev/dasda2: No such file or directory` 
      
      From the sosreport collected after successful boot we could see that there were following symlinks setup in /dev/disk/by-label:	
      
      
      	[sosreport]$ less sos_commands/block/ls_-lanR_.dev 
      	[...]
      	/dev/disk/by-label:
      	total 0
      	drwxr-xr-x. 2 0 0 100 Apr 24 10:09 .
      	drwxr-xr-x. 7 0 0 140 Apr 24 10:09 ..
      	lrwxrwxrwx. 1 0 0  12 Apr 24 10:09 boot -> ../../dasda1
      	lrwxrwxrwx. 1 0 0  12 Apr 24 10:09 crypt_rootfs -> ../../dasda2
      	lrwxrwxrwx. 1 0 0  10 Apr 24 10:09 root -> ../../dm-0                   <<----------
      
      
      The command outputs collected from emergency mode, during failed boot process, shows that   "root -> ../../dm-0" link was not setup in by-label directory. However /dev/dm-0 device was setup by the time boot process failed:
      
      Command outputs from emergency mode:
      
      
      	[Console logs]$ less 0200-worker-3-emergency-mode.txt 
      	[...]
      	11:56:41 ls -l /dev/disk/by-label/                                              
      	11:56:42  ¬?2004l 11:56:42 total 0                                              
      	11:56:42 lrwxrwxrwx 1 root root 12 Apr 26 08:11 boot -> ../../dasda1            
      	11:56:42 lrwxrwxrwx 1 root root 12 Apr 26 08:11 crypt_rootfs -> ../../dasda2    		<<---------- "root -> ../../dm-0" symlink is missing
      
      
      After multiple retries it gets booted successfully. 

      Version-Release number of selected component (if applicable):

      4.13.36

      How reproducible:

      NA

      Steps to Reproduce:

          1. Upgrade cluster to 4.13
          2. Check the master and worker node boot
          3. Observe the nodes if they booting in emergency mode and collect console logs.
          

      Actual results:

      Node went into emergency mode

      Expected results:

      Node should boot successfully without any issue.

      Additional info:

      Customer is using Zvm to VM provisioning.

              mapillai Madhu Pillai
              openshift-crt-jira-prow OpenShift Prow Bot
              Michael Nguyen Michael Nguyen
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: