Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: odf-4.21
Component/s: rook
Labels:
- telco-pre-ga-4.22

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Dev Approval:
?
Docs Approval:
?
PM Approval:
?
QE Approval:
?
Target Release:

odf-4.21.z
Intelligence Requested:
Market:
RH Private Keywords:

Severity:
Important

Release Blocker:
Proposed

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem - Provide a detailed description of the issue encountered, including logs/command-output snippets and screenshots if the issue is observed in the UI:

In ODF 4.21 we have identified that on the specific case of redeploying an existing MNO cluster with ODF installed, having configured the Storagecluster property "storagecluster.spec.managedResources.cephCluster.cleanupPolicy.wipeDevicesFromOtherClusters", the ceph metadata is not properly cleaned on the cluster disks and the "rook-ceph-osd-prepare-ocs-deviceset-0-data-xxx" pods fail on execution.

$ oc -n openshift-storage get pods rook-ceph-osd-prepare-ocs-deviceset-0-data-0kwtmc-zb9hs -o yaml

  - containerID: cri-o://cbd502693728bd37c11d9c70878f8aad4f9bb0464ed07f16ecc1b418a0dd0f56
    image: registry.redhat.io/rhceph/rhceph-9-rhel9@sha256:13c53e64042cd902b84492c12109e94d6082415b26ede321d7333819fe551477
    imageID: registry.redhat.io/rhceph/rhceph-9-rhel9@sha256:13c53e64042cd902b84492c12109e94d6082415b26ede321d7333819fe551477
    lastState:
      terminated:
        containerID: cri-o://cbd502693728bd37c11d9c70878f8aad4f9bb0464ed07f16ecc1b418a0dd0f56
        exitCode: 1
        finishedAt: "2026-02-06T10:31:37Z"
        message: 'failed to configure devices: failed to get device already provisioned
          by ceph-volume raw: osd.2: "24d32d6c-fac1-4e92-b79a-404231885b84" belonging
          to a different ceph cluster "439e2810-4589-4603-adbd-127fbd3d1b3e"'
        reason: Error
        startedAt: "2026-02-06T10:31:34Z"

$ oc -n openshift-storage logs rook-ceph-operator-65b78bd45c-95p8s


2026-02-06 10:30:11.653641 E | op-osd: failed to provision OSD(s) on PVC ocs-deviceset-2-data-0sp9zf. &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices: failed to get device already provisioned by ceph-volume raw: osd.1: "fba9b9a1-74a1-4980-9faa-b0425447141f" belonging to a different ceph cluster "439e2810-4589-4603-adbd-127fbd3d1b3e"} 2026-02-06 10:30:11.876154 I | cephclient: reconciling replicated pool ocs-storagecluster-cephobjectstore.rgw.buckets.index succeeded 2026-02-06 10:30:12.116430 I | op-osd: [openshift-storage] OSD orchestration status for PVC ocs-deviceset-0-data-0kwtmc is "failed" 2026-02-06 10:30:12.116455 E | op-osd: failed to provision OSD(s) on PVC ocs-deviceset-0-data-0kwtmc. &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices: failed to get device already provisioned by ceph-volume raw: osd.2: "24d32d6c-fac1-4e92-b79a-404231885b84" belonging to a different ceph cluster "439e2810-4589-4603-adbd-127fbd3d1b3e"} 2026-02-06 10:30:12.515001 I | cephclient: creating a new crush rule for changed deviceClass ("default"-->"") on crush rule "ocs-storagecluster-cephobjectstore.rgw.buckets.index" 2026-02-06 10:30:12.515017 I | cephclient: updating pool "ocs-storagecluster-cephobjectstore.rgw.buckets.index" failure domain from "rack" to "rack" with new crush rule "ocs-storagecluster-cephobjectstore.rgw.buckets.index_rack" 2026-02-06 10:30:12.515021 I | cephclient: crush rule "ocs-storagecluster-cephobjectstore.rgw.buckets.index" will no longer be used by pool "ocs-storagecluster-cephobjectstore.rgw.buckets.index" 2026-02-06 10:30:13.283696 I | op-osd: [openshift-storage] OSD orchestration status for PVC ocs-deviceset-1-data-097445 is "orchestrating" 2026-02-06 10:30:13.881916 I | cephclient: Successfully updated pool "ocs-storagecluster-cephobjectstore.rgw.buckets.index" failure domain to "rack" 2026-02-06 10:30:13.881932 I | cephclient: setting pool property "pg_num_min" to "8" on pool "ocs-storagecluster-cephobjectstore.rgw.buckets.index" 2026-02-06 10:30:13.913021 I | op-osd: [openshift-storage] OSD orchestration status for PVC ocs-deviceset-1-data-097445 is "failed" 2026-02-06 10:30:13.913038 E | op-osd: failed to provision OSD(s) on PVC ocs-deviceset-1-data-097445. &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices: failed to get device already provisioned by ceph-volume raw: osd.0: "65f203c1-7761-4888-9fdf-0050c0d84670" belonging to a different ceph cluster "439e2810-4589-4603-adbd-127fbd3d1b3e"} 2026-02-06 10:30:13.930826 E | cluster-controller: failed to reconcile CephCluster "openshift-storage/ocs-storagecluster-cephcluster". failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to create cluster: failed to start ceph osds: 3 failures encountered while running osds on nodes in namespace "openshift-storage".  failed to provision OSD(s) on PVC ocs-deviceset-2-data-0sp9zf. &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices: failed to get device already provisioned by ceph-volume raw: osd.1: "fba9b9a1-74a1-4980-9faa-b0425447141f" belonging to a different ceph cluster "439e2810-4589-4603-adbd-127fbd3d1b3e"} failed to provision OSD(s) on PVC ocs-deviceset-0-data-0kwtmc. &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices: failed to get device already provisioned by ceph-volume raw: osd.2: "24d32d6c-fac1-4e92-b79a-404231885b84" belonging to a different ceph cluster "439e2810-4589-4603-adbd-127fbd3d1b3e"} failed to provision OSD(s) on PVC ocs-deviceset-1-data-097445. &{OSDs:[] Status:failed PvcBackedOSD:true Message:failed to configure devices: failed to get device already provisioned by ceph-volume raw: osd.0: "65f203c1-7761-4888-9fdf-0050c0d84670" belonging to a different ceph cluster "439e2810-4589-4603-adbd-127fbd3d1b3e"}

The OCP platform infrastructure and deployment type (AWS, Bare Metal, VMware, etc. Please clarify if it is platform agnostic deployment), (IPI/UPI):

ACM/ZTP/GITOPS deployment on Bare Metal

The ODF deployment type (Internal, External, Internal-Attached (LSO), Multicluster, DR, Provider, etc):

Internal ODF

The version of all relevant components (OCP, ODF, RHCS, ACM whichever is applicable):

OCP 4.22.0-EC.1
ODF 4.21.0-77

Does this issue impact your ability to continue to work with the product?

Deployment is not blocked, ODF can be deployed using the workaround

Is there any workaround available to the best of your knowledge?

Using the extra-manifests yaml, the Ceph metadata is cleaned and ODF is properly deployed

Can this issue be reproduced? If so, please provide the hit rate

Yes, 100%

Can this issue be reproduced from the UI?

If this is a regression, please provide more details to justify this:

I think is a regression, in ODF 4.20 the feature worked as expected

Steps to Reproduce:

1. Deploy a MNO cluster
2. Redeploy the MNO cluster with ODF 4.21 and the Storagecluster property "storagecluster.spec.managedResources.cephCluster.cleanupPolicy.wipeDevicesFromOtherClusters" configured
3. After the cluster is deployed, check the rook-ceph-operator pod logs

The exact date and time when the issue was observed, including timezone details:

Actual results:

The Ceph metatadata is not properly deleted, the ODF StorageCluster doesn't achieves READY status

Expected results:

The Ceph metatadata is properly deleted, the ODF StorageCluster achieves READY status

Logs collected and log location:

ODF must-gather collected and available on the next link

MG-Feb-6: https://drive.google.com/file/d/13LC6ShJIBhRcv_Cipqo6R6X89Iu1jrz9/view?usp=sharing

MG-Feb-12: https://drive.google.com/file/d/1mObuVdB1FVPCtlIL-vXCv_Yr389L22Ea/view?usp=sharing

Logs-Feb-12: https://drive.google.com/file/d/1ofZibMD1T4SOTU9pQWTkhUByW7p6Rx73/view?usp=sharing

Additional info:

This issue was observed during PreGA 4.18 testing and a bug was opened DFBUGS-1655.
The approach was to create an RFE ODFRFE-19 that was introduced with ODF-4.20.0.
mavazque@redhat.com created a detailed document with the reproducer and the workaround when opened the bug DFBUGS-1655 https://docs.google.com/document/d/1HBej5PCPpFibynlrnJHt12rgctr90WUk7EEj_Kkmgq4/edit?tab=t.0

Assignee:: Santosh Pillai

Reporter:: Federico Ferrando

QA Contact:: Elad Ben Aharon

Votes:: 0 Vote for this issue

Watchers:: 27 Start watching this issue

Created:: 2026/02/09 10:22 AM

Updated:: 2026/02/24 4:09 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty

Hide