-
Bug
-
Resolution: Won't Do
-
Undefined
-
None
-
odf-4.18
Description of problem:
In ODF 4.18 we have identified two different issues related to the cleanup of the CEPH metadata in the OSD disks. First issue: Annotation "uninstall.ocs.openshift.io/cleanup-policy: delete" is not being honored. After deleting the StorageSystem (and as such the StorageCluster), CEPH metadata is not getting removed. We can connect to the node and run this command to verify metadata is still there: [root@openshift-ctlplane-0 ~]# podman run --rm -ti --privileged --device /dev/vdb --entrypoint ceph-volume quay.io/ceph/ceph:v19 raw list /dev/vdb --format json{ "3460d04a-0dbc-40a8-9947-ac8192b65d77": { "ceph_fsid": "b392dd9e-be7a-4e6e-bb24-78a962bb70be", "device": "/dev/vdb", "osd_id": 1, "osd_uuid": "3460d04a-0dbc-40a8-9947-ac8192b65d77", "type": "bluestore" }} Second issue: Manual cleanup of CEPH Metadata following upstream docs[1] is not cleaning CEPH metadata: [root@openshift-ctlplane-0 ~]# wipefs -a -f /dev/vdb [root@openshift-ctlplane-0 ~]# echo $? 0 [root@openshift-ctlplane-0 ~]# podman run --rm -ti --privileged --device /dev/vdb --entrypoint ceph-volume quay.io/ceph/ceph:v19 raw list /dev/vdb --format json { "3460d04a-0dbc-40a8-9947-ac8192b65d77": { "ceph_fsid": "b392dd9e-be7a-4e6e-bb24-78a962bb70be", "device": "/dev/vdb", "osd_id": 1, "osd_uuid": "3460d04a-0dbc-40a8-9947-ac8192b65d77", "type": "bluestore" } } [root@openshift-ctlplane-0 ~]# DISK="/dev/vdb" [root@openshift-ctlplane-0 ~]# sgdisk --zap-all $DISK Creating new GPT entries in memory.GPT data structures destroyed! You may now partition the disk using fdisk orother utilities. [root@openshift-ctlplane-0 ~]# dd if=/dev/zero of="$DISK" bs=1M count=100 oflag=direct,dsync 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.0795897 s, 1.3 GB/s [root@openshift-ctlplane-0 ~]# blkdiscard $DISK [root@openshift-ctlplane-0 ~]# podman run --rm -ti --privileged --device /dev/vdb --entrypoint ceph-volume quay.io/ceph/ceph:v19 raw list /dev/vdb --format json { "3460d04a-0dbc-40a8-9947-ac8192b65d77": { "ceph_fsid": "b392dd9e-be7a-4e6e-bb24-78a962bb70be", "device": "/dev/vdb", "osd_id": 1, "osd_uuid": "3460d04a-0dbc-40a8-9947-ac8192b65d77", "type": "bluestore" } } In our case it takes around ~15GB of zeroes to clean the metadata: [root@openshift-ctlplane-0 ~]# dd if=/dev/zero of=/dev/vdb bs=1G count=15 oflag=direct,dsync status=progress 16106127360 bytes (16 GB, 15 GiB) copied, 30 s, 542 MB/s 15+0 records in 15+0 records out 16106127360 bytes (16 GB, 15 GiB) copied, 29.7043 s, 542 MB/s [root@openshift-ctlplane-0 ~]# podman run --rm -ti --privileged --device /dev/vdb --entrypoint ceph-volume quay.io/ceph/ceph:v19 raw list /dev/vdb --format json {} [1] https://rook.io/docs/rook/v1.14/Getting-Started/ceph-teardown/#zapping-devices
We have this document with further information and reproducible steps: https://docs.google.com/document/d/1HBej5PCPpFibynlrnJHt12rgctr90WUk7EEj_Kkmgq4/edit?tab=t.0
Version-Release number of selected component (if applicable):
v4.18.0-112.stable
How reproducible:
Always
Steps to Reproduce:
Described above and in linked document.
Actual results:
OSD disk is not cleaned and OSD prepare fails
Expected results:
OSD disk is cleaned and OSD prepare succeeds
Additional info:
We have an environment where this can be reproduced if required. More info: https://docs.google.com/document/d/1HBej5PCPpFibynlrnJHt12rgctr90WUk7EEj_Kkmgq4/edit?tab=t.0
- relates to
-
ODFRFE-19 Support for the ODF Operator to cleanup ceph bluestore metadata from OSD disks before deploying the cluster
-
- Backlog
-