Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-29817

Non-guaranteed persistent disk naming across reboots causing ODF pods go down

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Major Major
    • None
    • 4.14.z
    • RHCOS
    • Important
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      This is a similar issue reported in https://issues.redhat.com/browse/OCPBUGS-29474 where the basic problem is of disk name swapping which causes issues across reboots. Tradditionally, Linux Kernel utilized in RHEL as well as all other distros don't guarantee the disk names to be persistent across reboots. 

      At Red Hat IBM COC, One of our other client is facing similar issue (not exactly same scenario of dedicated storage for etcd but separate disk for Ceph Storage)

      • Baremetal Installation on vSphere ESXi 7.0
      • Three masters VMs
      • Three workers VMs with two HDDs each

      The Primary disk of 192GB is OS and other disk of 1TB is for ODF Storage.
      The installation went through properly detecting the disks for OS and for ODF (1TB) as expected, but reboots flipped the names. When this happens, CoreOS itself can boot without problem but rook-ceph-osd-x-xxxx pod scheduled on such node fails.

      # ssh core@worker0.xxxx lsblk
      NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
      loop0    7:0    0     1T  0 loop 
      sda      8:0    0   192G  0 disk 
      ├─sda1   8:1    0     1M  0 part 
      ├─sda2   8:2    0   127M  0 part 
      ├─sda3   8:3    0   384M  0 part /boot
      ....
      ....
      sdb      8:16   0     1T  0 disk 
      sr0     11:0    1  1024M  0 rom
      

      The rook-ceph-osd-x-xxxx pod is working fine on the node which has correct device paths like above.

      But when /dev/sda and /dev/sdb is changed, that pod is not working.

      And this problem happens randomly whenever we reboot each node.

      $ cat proc/partitions 
      major minor  #blocks  name
         8        0  201326592 sda
         8        1       1024 sda1
         8        2     130048 sda2
         8        3     393216 sda3
         8        4  200801263 sda4
         8       16 1073741824 sdb
      $ cat sos_commands/block/blkid_-c_.dev.null  | grep sd
      /dev/sdb: TYPE="ceph_bluestore"
      /dev/sda4: LABEL="root" UUID="d4e71bf1-e1b1-4061-8820-94d415bac740" TYPE="xfs" PARTLABEL="root" PARTUUID="7f8ab0e2-e8a2-4678-a692-8420fa0da9ee"
      /dev/sda2: SEC_TYPE="msdos" LABEL_FATBOOT="EFI-SYSTEM" LABEL="EFI-SYSTEM" UUID="E3C4-265C" TYPE="vfat" PARTLABEL="EFI-SYSTEM" PARTUUID="162c85ea-992b-440a-b0bd-a817347eaecc"
      /dev/sda3: LABEL="boot" UUID="9fb2c61c-5556-46aa-9836-69af853caa4b" TYPE="ext4" PARTLABEL="boot" PARTUUID="60e64345-8fcd-454a-82c3-e20b79aaa68a"
      /dev/sda1: PARTLABEL="BIOS-BOOT" PARTUUID="637709e2-87f5-4e3d-ac30-1d6996e85259"
      

      We need a solution/fix which would guarantee that the secondary disk used for ODF would always be detected as sdb in this specific case.

      Version-Release number of selected component (if applicable):

      4.x releases

      How reproducible:

      Random but 50% chances across all reboots of the nodes with multiple disks dedicated for such purposes.

      Steps to Reproduce:

          1. Install OCP
          2. Install ODF and dedicate a separate disk sdb for ceph storage
          3. Reboot the nodes to observe the disk names change across reboots randomly.
          

      Actual results:

          The disk names aren't persistent.

      Expected results:

           The disk names should be persistent.

      Additional info:

          The disk naming isn't guaranteed as per design, hence we might need some custom tailored solution for this scenario.

            Unassigned Unassigned
            rhn-support-pchavan Pushpendra Madhukar Chavan
            Michael Nguyen Michael Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: