Uploaded image for project: 'Data Foundation Bugs'
  1. Data Foundation Bugs
  2. DFBUGS-600

[2217568] [IBM Z] ODF deployed on IBM Z with DSAD/COREOS

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • odf-4.14
    • odf-4.12
    • Documentation
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • If docs needed, set a value
    • None

      Description of problem (please be detailed as possible and provide log
      snippests):

      After cluster wide reboot on cert auth. A ODF node reboot removed the DASD partition and they lost all 3 OSDs.

      Customer followed this IBM documentation to partition the DASD.

      > https://www.ibm.com/docs/en/linux-on-systems?topic=architecture-storage
      > See Section “4.1.2 Steps specific for DASD devices”

      ODF deployed successfully with LSO and the OSDs mapped to dasde1.

      To use host binaries, run `chroot /host`
      NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
      loop1 7:1 0 811.6G 0 loop
      dasda 94:0 0 103.2G 0 disk

      -dasda1 94:1 0 384M 0 part /host/boot
      `-dasda2 94:2 0 102.8G 0 part /host/sysroot
      dasde 94:16 0 811.6G 0 disk
      `-dasde1 94:17 0 811.6G 0 part

      After cluster wide reboot on cert auth.

      MapVolume.EvalHostSymlinks failed for volume "local-pv-ef04e88d" : lstat /dev/disk/by-id/ccw-IBM.750000000KHF61.baee.40-part1: no such file or directory

      Events log:

      3m27s Warning FailedMapVolume pod/rook-ceph-osd-0-59c9db848-5rp9f MapVolume.EvalHostSymlinks failed for volume "local-pv-ef04e88d" : lstat /dev/disk/by-id/ccw-IBM.750000000KHF61.baee.40-part1: no such file or directory
      23m Warning FailedMount pod/rook-ceph-osd-0-59c9db848-5rp9f (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[ocs-deviceset-odf-cluster-storage-0-data-1wgxd7], unattached volumes=[ocs-deviceset-odf-cluster-storage-0-data-1wgxd7 ocs-deviceset-odf-cluster-storage-0-data-1wgxd7-bridge kube-api-access-25kpz rook-data rook-config-override rook-ceph-log rook-ceph-crash run-udev]: timed out waiting for the condition

      NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
      dasda 94:0 0 103.2G 0 disk

      -dasda1 94:1 0 384M 0 part /boot
      `-dasda2 94:2 0 102.8G 0 part /sysroot
      dasde 94:16 0 811.6G 0 disk

      Version of all relevant components (if applicable):

      OCP/ODF 4.12

      Does this issue impact your ability to continue to work with the product
      (please explain in detail what is the user impact)?

      Node reboot destroys the OSD path to dasde1

      Is there any workaround available to the best of your knowledge?

      No

      Rate from 1 - 5 the complexity of the scenario you performed that caused this
      bug (1 - very simple, 5 - very complex)?

      5

      Can this issue reproducible?

      Yes, on reboot of the node.

      Can this issue reproduce from the UI?

      If this is a regression, please provide more details to justify this:

      Steps to Reproduce:
      1.
      2.
      3.

      Actual results:

      Expected results:

      DSAD partition persists on reboot.

      Additional info:

              asriram@redhat.com Anjana Sriram
              rhn-support-khover Kevan Hover
              Kevan Hover, Santosh Pillai
              Neha Berry Neha Berry
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated: