Uploaded image for project: 'Data Foundation Bugs'
  1. Data Foundation Bugs
  2. DFBUGS-1655

[GSS]OSD disks are not being properly cleaned in ODF 4.18

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Undefined Undefined
    • None
    • odf-4.18
    • unclassified
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • ?
    • ?
    • None

      Description of problem:

      In ODF 4.18 we have identified two different issues related to the cleanup of the CEPH metadata in the OSD disks.
      
      First issue:
      
      Annotation "uninstall.ocs.openshift.io/cleanup-policy: delete" is not being honored. After deleting the StorageSystem (and as such the StorageCluster), CEPH metadata is not getting removed. We can connect to the node and run this command to verify metadata is still there:
      
      [root@openshift-ctlplane-0 ~]# podman run --rm -ti --privileged --device /dev/vdb --entrypoint ceph-volume quay.io/ceph/ceph:v19 raw list /dev/vdb --format json{    "3460d04a-0dbc-40a8-9947-ac8192b65d77": {        "ceph_fsid": "b392dd9e-be7a-4e6e-bb24-78a962bb70be",        "device": "/dev/vdb",        "osd_id": 1,        "osd_uuid": "3460d04a-0dbc-40a8-9947-ac8192b65d77",        "type": "bluestore"    }}
      
      
      Second issue:
      
      Manual cleanup of CEPH Metadata following upstream docs[1] is not cleaning CEPH metadata:
      
      [root@openshift-ctlplane-0 ~]# wipefs -a -f /dev/vdb
      [root@openshift-ctlplane-0 ~]# echo $?
      0
      [root@openshift-ctlplane-0 ~]# podman run --rm -ti --privileged --device /dev/vdb --entrypoint ceph-volume quay.io/ceph/ceph:v19 raw list /dev/vdb --format json
      {    "3460d04a-0dbc-40a8-9947-ac8192b65d77": {
              "ceph_fsid": "b392dd9e-be7a-4e6e-bb24-78a962bb70be",
              "device": "/dev/vdb",
              "osd_id": 1,        
      "osd_uuid": "3460d04a-0dbc-40a8-9947-ac8192b65d77",
              "type": "bluestore"    }
      }
      
      [root@openshift-ctlplane-0 ~]# DISK="/dev/vdb"
      [root@openshift-ctlplane-0 ~]# sgdisk --zap-all $DISK
      Creating new GPT entries in memory.GPT data structures destroyed! You may now partition the disk using fdisk orother utilities.
      [root@openshift-ctlplane-0 ~]# dd if=/dev/zero of="$DISK" bs=1M count=100 oflag=direct,dsync
      100+0 records in
      100+0 records out
      104857600 bytes (105 MB, 100 MiB) copied, 0.0795897 s, 1.3 GB/s
      [root@openshift-ctlplane-0 ~]# blkdiscard $DISK
      [root@openshift-ctlplane-0 ~]# podman run --rm -ti --privileged --device /dev/vdb --entrypoint ceph-volume quay.io/ceph/ceph:v19 raw list /dev/vdb --format json
      {    "3460d04a-0dbc-40a8-9947-ac8192b65d77": {
              "ceph_fsid": "b392dd9e-be7a-4e6e-bb24-78a962bb70be",
              "device": "/dev/vdb",
              "osd_id": 1,        
      "osd_uuid": "3460d04a-0dbc-40a8-9947-ac8192b65d77",
              "type": "bluestore"    }
      }
      
      In our case it takes around ~15GB of zeroes to clean the metadata:
      
      [root@openshift-ctlplane-0 ~]# dd if=/dev/zero of=/dev/vdb bs=1G count=15 oflag=direct,dsync status=progress
      16106127360 bytes (16 GB, 15 GiB) copied, 30 s, 542 MB/s 
      15+0 records in
      15+0 records out
      16106127360 bytes (16 GB, 15 GiB) copied, 29.7043 s, 542 MB/s
      [root@openshift-ctlplane-0 ~]# podman run --rm -ti --privileged --device /dev/vdb --entrypoint ceph-volume quay.io/ceph/ceph:v19 raw list /dev/vdb --format json
      {}
      
      [1] https://rook.io/docs/rook/v1.14/Getting-Started/ceph-teardown/#zapping-devices

      We have this document with further information and reproducible steps: https://docs.google.com/document/d/1HBej5PCPpFibynlrnJHt12rgctr90WUk7EEj_Kkmgq4/edit?tab=t.0

      Version-Release number of selected component (if applicable):

      v4.18.0-112.stable    

      How reproducible:

      Always

      Steps to Reproduce:

      Described above and in linked document.
          

      Actual results:

      OSD disk is not cleaned and OSD prepare fails

      Expected results:

      OSD disk is cleaned and OSD prepare succeeds

      Additional info:

      We have an environment where this can be reproduced if required.
      
      More info: https://docs.google.com/document/d/1HBej5PCPpFibynlrnJHt12rgctr90WUk7EEj_Kkmgq4/edit?tab=t.0

              sapillai Santosh Pillai
              mavazque@redhat.com Mario Vazquez Cebrian
              Federico Ferrando
              Wei Duan Wei Duan
              Votes:
              0 Vote for this issue
              Watchers:
              28 Start watching this issue

                Created:
                Updated:
                Resolved: