-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
odf-4.21
Description of problem - Provide a detailed description of the issue encountered, including logs/command-output snippets and screenshots if the issue is observed in the UI:
In ODF 4.21 we have identified that on the specific case of redeploying an existing MNO cluster with ODF installed, having configured the Storagecluster property "storagecluster.spec.managedResources.cephCluster.cleanupPolicy.wipeDevicesFromOtherClusters", the ceph metadata is not properly cleaned on the cluster disks and the "rook-ceph-osd-prepare-ocs-deviceset-0-data-xxx" pods fail on execution.
$ oc -n openshift-storage get pods rook-ceph-osd-prepare-ocs-deviceset-0-data-0kwtmc-zb9hs -o yaml - containerID: cri-o://cbd502693728bd37c11d9c70878f8aad4f9bb0464ed07f16ecc1b418a0dd0f56 image: registry.redhat.io/rhceph/rhceph-9-rhel9@sha256:13c53e64042cd902b84492c12109e94d6082415b26ede321d7333819fe551477 imageID: registry.redhat.io/rhceph/rhceph-9-rhel9@sha256:13c53e64042cd902b84492c12109e94d6082415b26ede321d7333819fe551477 lastState: terminated: containerID: cri-o://cbd502693728bd37c11d9c70878f8aad4f9bb0464ed07f16ecc1b418a0dd0f56 exitCode: 1 finishedAt: "2026-02-06T10:31:37Z" message: 'failed to configure devices: failed to get device already provisioned by ceph-volume raw: osd.2: "24d32d6c-fac1-4e92-b79a-404231885b84" belonging to a different ceph cluster "439e2810-4589-4603-adbd-127fbd3d1b3e"' reason: Error startedAt: "2026-02-06T10:31:34Z"
$ oc -n openshift-storage logs rook-ceph-operator-65b78bd45c-95p8s 2026-02-06 10:30:11.653641 E | op-osd: failed to provision OSD(s) on PVC ocs-deviceset-2-data-0sp9zf. &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices: failed to get device already provisioned by ceph-volume raw: osd.1: "fba9b9a1-74a1-4980-9faa-b0425447141f" belonging to a different ceph cluster "439e2810-4589-4603-adbd-127fbd3d1b3e"} 2026-02-06 10:30:11.876154 I | cephclient: reconciling replicated pool ocs-storagecluster-cephobjectstore.rgw.buckets.index succeeded 2026-02-06 10:30:12.116430 I | op-osd: [openshift-storage] OSD orchestration status for PVC ocs-deviceset-0-data-0kwtmc is "failed" 2026-02-06 10:30:12.116455 E | op-osd: failed to provision OSD(s) on PVC ocs-deviceset-0-data-0kwtmc. &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices: failed to get device already provisioned by ceph-volume raw: osd.2: "24d32d6c-fac1-4e92-b79a-404231885b84" belonging to a different ceph cluster "439e2810-4589-4603-adbd-127fbd3d1b3e"} 2026-02-06 10:30:12.515001 I | cephclient: creating a new crush rule for changed deviceClass ("default"-->"") on crush rule "ocs-storagecluster-cephobjectstore.rgw.buckets.index" 2026-02-06 10:30:12.515017 I | cephclient: updating pool "ocs-storagecluster-cephobjectstore.rgw.buckets.index" failure domain from "rack" to "rack" with new crush rule "ocs-storagecluster-cephobjectstore.rgw.buckets.index_rack" 2026-02-06 10:30:12.515021 I | cephclient: crush rule "ocs-storagecluster-cephobjectstore.rgw.buckets.index" will no longer be used by pool "ocs-storagecluster-cephobjectstore.rgw.buckets.index" 2026-02-06 10:30:13.283696 I | op-osd: [openshift-storage] OSD orchestration status for PVC ocs-deviceset-1-data-097445 is "orchestrating" 2026-02-06 10:30:13.881916 I | cephclient: Successfully updated pool "ocs-storagecluster-cephobjectstore.rgw.buckets.index" failure domain to "rack" 2026-02-06 10:30:13.881932 I | cephclient: setting pool property "pg_num_min" to "8" on pool "ocs-storagecluster-cephobjectstore.rgw.buckets.index" 2026-02-06 10:30:13.913021 I | op-osd: [openshift-storage] OSD orchestration status for PVC ocs-deviceset-1-data-097445 is "failed" 2026-02-06 10:30:13.913038 E | op-osd: failed to provision OSD(s) on PVC ocs-deviceset-1-data-097445. &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices: failed to get device already provisioned by ceph-volume raw: osd.0: "65f203c1-7761-4888-9fdf-0050c0d84670" belonging to a different ceph cluster "439e2810-4589-4603-adbd-127fbd3d1b3e"} 2026-02-06 10:30:13.930826 E | cluster-controller: failed to reconcile CephCluster "openshift-storage/ocs-storagecluster-cephcluster". failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to create cluster: failed to start ceph osds: 3 failures encountered while running osds on nodes in namespace "openshift-storage". failed to provision OSD(s) on PVC ocs-deviceset-2-data-0sp9zf. &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices: failed to get device already provisioned by ceph-volume raw: osd.1: "fba9b9a1-74a1-4980-9faa-b0425447141f" belonging to a different ceph cluster "439e2810-4589-4603-adbd-127fbd3d1b3e"} failed to provision OSD(s) on PVC ocs-deviceset-0-data-0kwtmc. &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices: failed to get device already provisioned by ceph-volume raw: osd.2: "24d32d6c-fac1-4e92-b79a-404231885b84" belonging to a different ceph cluster "439e2810-4589-4603-adbd-127fbd3d1b3e"} failed to provision OSD(s) on PVC ocs-deviceset-1-data-097445. &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices: failed to get device already provisioned by ceph-volume raw: osd.0: "65f203c1-7761-4888-9fdf-0050c0d84670" belonging to a different ceph cluster "439e2810-4589-4603-adbd-127fbd3d1b3e"}
The OCP platform infrastructure and deployment type (AWS, Bare Metal, VMware, etc. Please clarify if it is platform agnostic deployment), (IPI/UPI):
ACM/ZTP/GITOPS deployment on Bare Metal
The ODF deployment type (Internal, External, Internal-Attached (LSO), Multicluster, DR, Provider, etc):
Internal ODF
The version of all relevant components (OCP, ODF, RHCS, ACM whichever is applicable):
OCP 4.22.0-EC.1 ODF 4.21.0-77
Does this issue impact your ability to continue to work with the product?
Deployment is not blocked, ODF can be deployed using the workaround
Is there any workaround available to the best of your knowledge?
Using the extra-manifests yaml, the Ceph metadata is cleaned and ODF is properly deployed
Can this issue be reproduced? If so, please provide the hit rate
Yes, 100%
Can this issue be reproduced from the UI?
If this is a regression, please provide more details to justify this:
I think is a regression, in ODF 4.20 the feature worked as expected
Steps to Reproduce:
1. Deploy a MNO cluster 2. Redeploy the MNO cluster with ODF 4.21 and the Storagecluster property "storagecluster.spec.managedResources.cephCluster.cleanupPolicy.wipeDevicesFromOtherClusters" configured 3. After the cluster is deployed, check the rook-ceph-operator pod logs
The exact date and time when the issue was observed, including timezone details:
Actual results:
The Ceph metatadata is not properly deleted, the ODF StorageCluster doesn't achieves READY status
Expected results:
The Ceph metatadata is properly deleted, the ODF StorageCluster achieves READY status
Logs collected and log location:
ODF must-gather collected and available on the next link
MG-Feb-6: https://drive.google.com/file/d/13LC6ShJIBhRcv_Cipqo6R6X89Iu1jrz9/view?usp=sharing
MG-Feb-12: https://drive.google.com/file/d/1mObuVdB1FVPCtlIL-vXCv_Yr389L22Ea/view?usp=sharing
Logs-Feb-12: https://drive.google.com/file/d/1ofZibMD1T4SOTU9pQWTkhUByW7p6Rx73/view?usp=sharing
Additional info:
- This issue was observed during PreGA 4.18 testing and a bug was opened DFBUGS-1655.
- The approach was to create an RFE ODFRFE-19 that was introduced with ODF-4.20.0.
- mavazque@redhat.com created a detailed document with the reproducer and the workaround when opened the bug DFBUGS-1655 https://docs.google.com/document/d/1HBej5PCPpFibynlrnJHt12rgctr90WUk7EEj_Kkmgq4/edit?tab=t.0