Loading...

XML

Word

Printable

Type: Bug
Resolution: Won't Do
Priority: Major
Fix Version/s: None
Affects Version/s: 4.14.z
Component/s: RHCOS
Labels:
- ODF
- RHCOS
- node

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important
Regression:
No

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

This is a similar issue reported in https://issues.redhat.com/browse/OCPBUGS-29474 where the basic problem is of disk name swapping which causes issues across reboots. Tradditionally, Linux Kernel utilized in RHEL as well as all other distros don't guarantee the disk names to be persistent across reboots.

At Red Hat IBM COC, One of our other client is facing similar issue (not exactly same scenario of dedicated storage for etcd but separate disk for Ceph Storage)

Baremetal Installation on vSphere ESXi 7.0
Three masters VMs
Three workers VMs with two HDDs each

The Primary disk of 192GB is OS and other disk of 1TB is for ODF Storage.
The installation went through properly detecting the disks for OS and for ODF (1TB) as expected, but reboots flipped the names. When this happens, CoreOS itself can boot without problem but rook-ceph-osd-x-xxxx pod scheduled on such node fails.

# ssh core@worker0.xxxx lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
loop0    7:0    0     1T  0 loop 
sda      8:0    0   192G  0 disk 
├─sda1   8:1    0     1M  0 part 
├─sda2   8:2    0   127M  0 part 
├─sda3   8:3    0   384M  0 part /boot
....
....
sdb      8:16   0     1T  0 disk 
sr0     11:0    1  1024M  0 rom

The rook-ceph-osd-x-xxxx pod is working fine on the node which has correct device paths like above.

But when /dev/sda and /dev/sdb is changed, that pod is not working.

And this problem happens randomly whenever we reboot each node.

$ cat proc/partitions 
major minor  #blocks  name
   8        0  201326592 sda
   8        1       1024 sda1
   8        2     130048 sda2
   8        3     393216 sda3
   8        4  200801263 sda4
   8       16 1073741824 sdb
$ cat sos_commands/block/blkid_-c_.dev.null  | grep sd
/dev/sdb: TYPE="ceph_bluestore"
/dev/sda4: LABEL="root" UUID="d4e71bf1-e1b1-4061-8820-94d415bac740" TYPE="xfs" PARTLABEL="root" PARTUUID="7f8ab0e2-e8a2-4678-a692-8420fa0da9ee"
/dev/sda2: SEC_TYPE="msdos" LABEL_FATBOOT="EFI-SYSTEM" LABEL="EFI-SYSTEM" UUID="E3C4-265C" TYPE="vfat" PARTLABEL="EFI-SYSTEM" PARTUUID="162c85ea-992b-440a-b0bd-a817347eaecc"
/dev/sda3: LABEL="boot" UUID="9fb2c61c-5556-46aa-9836-69af853caa4b" TYPE="ext4" PARTLABEL="boot" PARTUUID="60e64345-8fcd-454a-82c3-e20b79aaa68a"
/dev/sda1: PARTLABEL="BIOS-BOOT" PARTUUID="637709e2-87f5-4e3d-ac30-1d6996e85259"

We need a solution/fix which would guarantee that the secondary disk used for ODF would always be detected as sdb in this specific case.

Version-Release number of selected component (if applicable):

4.x releases

How reproducible:

Random but 50% chances across all reboots of the nodes with multiple disks dedicated for such purposes.

Steps to Reproduce:

    1. Install OCP
    2. Install ODF and dedicate a separate disk sdb for ceph storage
    3. Reboot the nodes to observe the disk names change across reboots randomly.

Actual results:

    The disk names aren't persistent.

Expected results:

     The disk names should be persistent.

Additional info:

    The disk naming isn't guaranteed as per design, hence we might need some custom tailored solution for this scenario.

impacts account

OCPBUGS-29474 Persistent disk naming issues persist across reboots in CoreOS, challenging conventional fixes, impacting various environments and requiring robust solutions.

Closed

Assignee:: Unassigned

Reporter:: Pushpendra Madhukar Chavan

Need Info From:: None

Contributors:: None

QA Contact:: Michael Nguyen

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2024/02/22 8:46 AM

Updated:: 2025/10/08 12:51 PM

Resolved:: 2024/04/18 3:57 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide