Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Major
Fix Version/s: None
Affects Version/s: 4.13.z
Component/s: RHCOS
Labels:
- FastFix
- Node
- OCP-4.13
- RHCOS
- osintegration

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
No

Target Backport Versions:

4.13.z, 4.14.z, 4.15.z, 4.16.z
Target Version:

4.16.z
Release Blocker:
None
Sprint:
255 - Integration & Delivery
sprint_count:
1

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
In Progress
Release Note Type:
Bug Fix
Release Note Text:

Hide
Cause: A bug in growpart caused the device to be locked , which prevented the LUKS device being opened.

Consequence: The system was unable to boot and landed in emergency mode.

Fix: The call to the growpart is removed, as it is not necessary.

Result: The system no longer breaks and successfully boots.

Show
Cause: A bug in growpart caused the device to be locked , which prevented the LUKS device being opened. Consequence: The system was unable to boot and landed in emergency mode. Fix: The call to the growpart is removed, as it is not necessary. Result: The system no longer breaks and successfully boots.

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

This is a clone of issue ~~OCPBUGS-33124~~. The following is the description of the original issue:
—
Description of problem:

After upgrading the cluster from 4.12 to 4.13. Nodes getting booted into emergency mode. Due to error 
`blockdev: cannot open /dev/dasda2: No such file or directory` 

From the sosreport collected after successful boot we could see that there were following symlinks setup in /dev/disk/by-label:	


	[sosreport]$ less sos_commands/block/ls_-lanR_.dev 
	[...]
	/dev/disk/by-label:
	total 0
	drwxr-xr-x. 2 0 0 100 Apr 24 10:09 .
	drwxr-xr-x. 7 0 0 140 Apr 24 10:09 ..
	lrwxrwxrwx. 1 0 0  12 Apr 24 10:09 boot -> ../../dasda1
	lrwxrwxrwx. 1 0 0  12 Apr 24 10:09 crypt_rootfs -> ../../dasda2
	lrwxrwxrwx. 1 0 0  10 Apr 24 10:09 root -> ../../dm-0                   <<----------


The command outputs collected from emergency mode, during failed boot process, shows that   "root -> ../../dm-0" link was not setup in by-label directory. However /dev/dm-0 device was setup by the time boot process failed:

Command outputs from emergency mode:


	[Console logs]$ less 0200-worker-3-emergency-mode.txt 
	[...]
	11:56:41 ls -l /dev/disk/by-label/                                              
	11:56:42  ¬?2004l 11:56:42 total 0                                              
	11:56:42 lrwxrwxrwx 1 root root 12 Apr 26 08:11 boot -> ../../dasda1            
	11:56:42 lrwxrwxrwx 1 root root 12 Apr 26 08:11 crypt_rootfs -> ../../dasda2    		<<---------- "root -> ../../dm-0" symlink is missing


After multiple retries it gets booted successfully.

Version-Release number of selected component (if applicable):

4.13.36

How reproducible:

NA

Steps to Reproduce:

    1. Upgrade cluster to 4.13
    2. Check the master and worker node boot
    3. Observe the nodes if they booting in emergency mode and collect console logs.

Actual results:

Node went into emergency mode

Expected results:

Node should boot successfully without any issue.

Additional info:

Customer is using Zvm to VM provisioning.

blocks

OCPBUGS-35988 After upgrading to 4.13 from 4.12 one of the worker node went into emergency mode.

Closed

OCPBUGS-35985 [4.15] After upgrading to 4.13 from 4.12 one of the worker node went into emergency mode.

Closed

clones

OCPBUGS-33124 After upgrading to 4.13 from 4.12 one of the worker node went into emergency mode.

Closed

is blocked by

OCPBUGS-33124 After upgrading to 4.13 from 4.12 one of the worker node went into emergency mode.

Closed

is cloned by

OCPBUGS-35985 [4.15] After upgrading to 4.13 from 4.12 one of the worker node went into emergency mode.

Closed

links to

openshift/os#1534: [release-4.16] OCPBUGS-35973: coreos-cryptfs: drop growpart call

RHBA-2024:4156 OpenShift Container Platform 4.16.z bug fix update

(2 links to)

Assignee:: Madhu Pillai (Inactive)

Reporter:: OpenShift Prow Bot

QA Contact:: Michael Nguyen

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2024/06/24 1:24 PM

Updated:: 2025/07/22 11:36 AM

Resolved:: 2024/07/03 11:31 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates