Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-54152

Local-Storage-Operator fails to release Persistent Volumes of type IBM ECKD DASD in Block mode

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Undefined Undefined
    • None
    • 4.18
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Critical
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      The local storage operator fails to make a Persistent Volume (PV) available after usage, if it is of type IBM ECKD DASD & used in Block mode.

      The PV remains in the following state:

      # oc get pv
      NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                                            STORAGECLASS                  VOLUMEATTRIBUTESCLASS   REASON   AGE
      [...]
      local-pv-78f4a9a                           70Gi       RWO            Delete           Released    default/vm-pvc-fio                                               local-dasd                    <unset>                          11m
      # oc describe pv local-pv-78f4a9a
      Name:              local-pv-78f4a9a
      [...]
      Events:
        Type    Reason        Age    From                            Message
        ----    ------        ----   ----                            -------
        Normal  VolumeDelete  2m47s  localvolume-symlink-controller  Starting cleanup of Block PV "local-pv-78f4a9a", this may take a while
      

      The operator log reveals the underlying issue - mkfs.xfs throws an segmentation fault:

       

      # oc logs diskmaker-manager-prwxv -n openshift-local-storage
      I0321 14:56:28.260731  311971 deleter.go:324] Cleanup pv "local-pv-78f4a9a": StderrBuf - "Metadata CRC error detected at 0x2aa35d15890, xfs_agf block 0x8/0x1000"
      I0321 14:56:28.262276  311971 deleter.go:324] Cleanup pv "local-pv-78f4a9a": StderrBuf - "Metadata CRC error detected at 0x2aa35d32800, xfs_agi block 0x10/0x1000"
      I0321 14:56:28.262861  311971 deleter.go:324] Cleanup pv "local-pv-78f4a9a": StderrBuf - "Metadata CRC error detected at 0x2aa35d4059a, xfs_sb block 0x0/0x1000"
      I0321 14:56:28.798599  311971 deleter.go:324] Cleanup pv "local-pv-78f4a9a": StderrBuf - "/scripts/quick_reset.sh: line 36: 317053 Segmentation fault      (core dumped) ionice -c 3 mkfs.xfs -f $LOCAL_PV_BLKDEVICE"

      The reason for this behavior are the specifics of IBM ECKD DASD devices:

       

       

      # https://www.ibm.com/docs/en/linux-on-systems?topic=wd-preparing-eckd-1
      Before you can use an ECKD type DASD as a disk for Linux® on IBM® Z, you must format it with a suitable disk layout.

       

       

      It is mandatory, to use a suitable disk layout before using the DASD. The `quick_reset.sh` script does not check, if the disk is of type DASD and therefore tries to mkfs.xfs on the DASD without partition.

      This behavior makes it impossible to use IBM ECKD DASD with Local-Storage-Operator in Block mode without manual cleanup.

      Furthermore, the `quick_reset.sh` script can not be controlled by setting the `forceWipeDevicesAndDestroyAllData: false`, for some reason. Is using a PV in block mode expected to be a destructive action?

      Therefore, the `quick_reset.sh` script must not execute mkfs.xfs on a ECKD DASD device with no partition. The device driver on OS-level reveals the type and may be used to implement a check, e. g. `/sys/class/block/dasdb/device/subsystem/drivers/dasd-eckd`

      I have attached all necessary logs on my Red Hat Google Drive, including must-gather and sosreport of the worker with the IBM ECKD DASD: https://drive.google.com/drive/folders/1rfmvebgrLjTzU6oKLI3M2ctpF3kUWZ2w?usp=drive_link

      Thank you very much for looking into this issue & please let me know, if you need further data or input. 

       

      Version-Release number of selected component (if applicable):

          4.18.4

      How reproducible:

      Always

      Steps to Reproduce:

      - Attach ECKD DASD on Worker
      - Make it available via LocalVolume
      - Use it in Block mode with a PVC
      - Delete PVC

      Actual results:

           - Stuck in Released state forever (due to Seg. Fault of mkfs.xfs in quick_reset.sh hack script)     

      Expected results:

          - In available state

      Additional info:

      Setting forceWipeDevicesAndDestroyAllData: false does not help

       

              Unassigned Unassigned
              rh-ee-mgotin Manuel Gotin
              Manuel Gotin
              Dominik Werle, Klaus Smolin, Muhammad Adeel
              Wei Duan Wei Duan
              None
              IBM Confidential Group, ocp-multi-arch-ibm-partners
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: