Uploaded image for project: 'Data Foundation Bugs'
  1. Data Foundation Bugs
  2. DFBUGS-5549

[GSS] OSD disks are not being properly cleaned in ODF 4.21

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • odf-4.21
    • rook
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • ?
    • ?
    • Important
    • Proposed
    • None

      Description of problem - Provide a detailed description of the issue encountered, including logs/command-output snippets and screenshots if the issue is observed in the UI:

      In ODF 4.21 we have identified that on the specific case of redeploying an existing MNO cluster with ODF installed, having configured the Storagecluster property "storagecluster.spec.managedResources.cephCluster.cleanupPolicy.wipeDevicesFromOtherClusters", the ceph metadata is not properly cleaned on the cluster disks and the "rook-ceph-osd-prepare-ocs-deviceset-0-data-xxx" pods fail on execution.
      $ oc -n openshift-storage get pods rook-ceph-osd-prepare-ocs-deviceset-0-data-0kwtmc-zb9hs -o yaml
      
        - containerID: cri-o://cbd502693728bd37c11d9c70878f8aad4f9bb0464ed07f16ecc1b418a0dd0f56
          image: registry.redhat.io/rhceph/rhceph-9-rhel9@sha256:13c53e64042cd902b84492c12109e94d6082415b26ede321d7333819fe551477
          imageID: registry.redhat.io/rhceph/rhceph-9-rhel9@sha256:13c53e64042cd902b84492c12109e94d6082415b26ede321d7333819fe551477
          lastState:
            terminated:
              containerID: cri-o://cbd502693728bd37c11d9c70878f8aad4f9bb0464ed07f16ecc1b418a0dd0f56
              exitCode: 1
              finishedAt: "2026-02-06T10:31:37Z"
              message: 'failed to configure devices: failed to get device already provisioned
                by ceph-volume raw: osd.2: "24d32d6c-fac1-4e92-b79a-404231885b84" belonging
                to a different ceph cluster "439e2810-4589-4603-adbd-127fbd3d1b3e"'
              reason: Error
              startedAt: "2026-02-06T10:31:34Z" 
      $ oc -n openshift-storage logs rook-ceph-operator-65b78bd45c-95p8s
      
      
      2026-02-06 10:30:11.653641 E | op-osd: failed to provision OSD(s) on PVC ocs-deviceset-2-data-0sp9zf. &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices: failed to get device already provisioned by ceph-volume raw: osd.1: "fba9b9a1-74a1-4980-9faa-b0425447141f" belonging to a different ceph cluster "439e2810-4589-4603-adbd-127fbd3d1b3e"} 2026-02-06 10:30:11.876154 I | cephclient: reconciling replicated pool ocs-storagecluster-cephobjectstore.rgw.buckets.index succeeded 2026-02-06 10:30:12.116430 I | op-osd: [openshift-storage] OSD orchestration status for PVC ocs-deviceset-0-data-0kwtmc is "failed" 2026-02-06 10:30:12.116455 E | op-osd: failed to provision OSD(s) on PVC ocs-deviceset-0-data-0kwtmc. &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices: failed to get device already provisioned by ceph-volume raw: osd.2: "24d32d6c-fac1-4e92-b79a-404231885b84" belonging to a different ceph cluster "439e2810-4589-4603-adbd-127fbd3d1b3e"} 2026-02-06 10:30:12.515001 I | cephclient: creating a new crush rule for changed deviceClass ("default"-->"") on crush rule "ocs-storagecluster-cephobjectstore.rgw.buckets.index" 2026-02-06 10:30:12.515017 I | cephclient: updating pool "ocs-storagecluster-cephobjectstore.rgw.buckets.index" failure domain from "rack" to "rack" with new crush rule "ocs-storagecluster-cephobjectstore.rgw.buckets.index_rack" 2026-02-06 10:30:12.515021 I | cephclient: crush rule "ocs-storagecluster-cephobjectstore.rgw.buckets.index" will no longer be used by pool "ocs-storagecluster-cephobjectstore.rgw.buckets.index" 2026-02-06 10:30:13.283696 I | op-osd: [openshift-storage] OSD orchestration status for PVC ocs-deviceset-1-data-097445 is "orchestrating" 2026-02-06 10:30:13.881916 I | cephclient: Successfully updated pool "ocs-storagecluster-cephobjectstore.rgw.buckets.index" failure domain to "rack" 2026-02-06 10:30:13.881932 I | cephclient: setting pool property "pg_num_min" to "8" on pool "ocs-storagecluster-cephobjectstore.rgw.buckets.index" 2026-02-06 10:30:13.913021 I | op-osd: [openshift-storage] OSD orchestration status for PVC ocs-deviceset-1-data-097445 is "failed" 2026-02-06 10:30:13.913038 E | op-osd: failed to provision OSD(s) on PVC ocs-deviceset-1-data-097445. &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices: failed to get device already provisioned by ceph-volume raw: osd.0: "65f203c1-7761-4888-9fdf-0050c0d84670" belonging to a different ceph cluster "439e2810-4589-4603-adbd-127fbd3d1b3e"} 2026-02-06 10:30:13.930826 E | cluster-controller: failed to reconcile CephCluster "openshift-storage/ocs-storagecluster-cephcluster". failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to create cluster: failed to start ceph osds: 3 failures encountered while running osds on nodes in namespace "openshift-storage".  failed to provision OSD(s) on PVC ocs-deviceset-2-data-0sp9zf. &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices: failed to get device already provisioned by ceph-volume raw: osd.1: "fba9b9a1-74a1-4980-9faa-b0425447141f" belonging to a different ceph cluster "439e2810-4589-4603-adbd-127fbd3d1b3e"} failed to provision OSD(s) on PVC ocs-deviceset-0-data-0kwtmc. &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices: failed to get device already provisioned by ceph-volume raw: osd.2: "24d32d6c-fac1-4e92-b79a-404231885b84" belonging to a different ceph cluster "439e2810-4589-4603-adbd-127fbd3d1b3e"} failed to provision OSD(s) on PVC ocs-deviceset-1-data-097445. &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices: failed to get device already provisioned by ceph-volume raw: osd.0: "65f203c1-7761-4888-9fdf-0050c0d84670" belonging to a different ceph cluster "439e2810-4589-4603-adbd-127fbd3d1b3e"}
      

       

      The OCP platform infrastructure and deployment type (AWS, Bare Metal, VMware, etc. Please clarify if it is platform agnostic deployment), (IPI/UPI):

      ACM/ZTP/GITOPS deployment on Bare Metal 

       

      The ODF deployment type (Internal, External, Internal-Attached (LSO), Multicluster, DR, Provider, etc):

      Internal ODF
      

       

      The version of all relevant components (OCP, ODF, RHCS, ACM whichever is applicable):

      OCP 4.22.0-EC.1
      ODF 4.21.0-77

       

      Does this issue impact your ability to continue to work with the product?

      Deployment is not blocked, ODF can be deployed using the workaround
      

       

      Is there any workaround available to the best of your knowledge?

      Using the extra-manifests yaml, the Ceph metadata is cleaned and ODF is properly deployed

       

      Can this issue be reproduced? If so, please provide the hit rate

      Yes, 100%
      

       

      Can this issue be reproduced from the UI?

       

      If this is a regression, please provide more details to justify this:

      I think is a regression, in ODF 4.20 the feature worked as expected
      

       

      Steps to Reproduce:

      1. Deploy a MNO cluster
      2. Redeploy the MNO cluster with ODF 4.21 and the Storagecluster property "storagecluster.spec.managedResources.cephCluster.cleanupPolicy.wipeDevicesFromOtherClusters" configured
      3. After the cluster is deployed, check the rook-ceph-operator pod logs
      

       

      The exact date and time when the issue was observed, including timezone details:

       

      Actual results:

      The Ceph metatadata is not properly deleted, the ODF StorageCluster doesn't achieves READY status
      

       

      Expected results:

      The Ceph metatadata is properly deleted, the ODF StorageCluster achieves READY status 

       

      Logs collected and log location:

      ODF must-gather collected and available on the next link
      

      MG-Feb-6: https://drive.google.com/file/d/13LC6ShJIBhRcv_Cipqo6R6X89Iu1jrz9/view?usp=sharing

      MG-Feb-12: https://drive.google.com/file/d/1mObuVdB1FVPCtlIL-vXCv_Yr389L22Ea/view?usp=sharing

      Logs-Feb-12: https://drive.google.com/file/d/1ofZibMD1T4SOTU9pQWTkhUByW7p6Rx73/view?usp=sharing

       

      Additional info:

              sapillai Santosh Pillai
              rh-ee-feferran Federico Ferrando
              Elad Ben Aharon Elad Ben Aharon
              Votes:
              0 Vote for this issue
              Watchers:
              27 Start watching this issue

                Created:
                Updated: