Uploaded image for project: 'Data Foundation Bugs'
  1. Data Foundation Bugs
  2. DFBUGS-564

[2290320] Restore storagecluster and recover ODF from deleting Phase

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • odf-4.12
    • rook
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • If docs needed, set a value
    • None

      Description of problem (please be detailed as possible and provide log
      snippests):

      OCS storagecluster/storagesystem was deleted and ODF entered in terminating mode, but all the PVCs exist which prevented the deletion

      For the cephcluster cr yaml, it shows there was a deletion initiated which did not complete. The deletion didn't went through since there were dependent objects that was unable to remove. Please see the conditions > 'CephCluster "openshift-storage/ocs-storagecluster-cephcluster" will not be deleted until all dependents are removed . This failed with reason: ObjectHasDependents

      oc get cephcluster -o yaml
      apiVersion: v1
      items:

      • apiVersion: ceph.rook.io/v1
        kind: CephCluster

      ...
      conditions:

      • lastHeartbeatTime: "2024-05-17T02:11:20Z"
        lastTransitionTime: "2023-11-05T09:59:50Z"
        message: Cluster created successfully
        reason: ClusterCreated
        status: "True"
        type: Ready
      • lastHeartbeatTime: "2024-06-03T06:55:16Z"
        lastTransitionTime: "2024-05-17T02:12:01Z"
        message: 'CephCluster "openshift-storage/ocs-storagecluster-cephcluster" will
        not be deleted until all dependents are removed: CephBlockPool: [ocs-storagecluster-cephblockpool],
        CephFilesystem: [ocs-storagecluster-cephfilesystem], CephObjectStore: [ocs-storagecluster-cephobjectstore],
        CephObjectStoreUser: [noobaa-ceph-objectstore-user ocs-storagecluster-cephobjectstoreuser
        prometheus-user]'
        reason: ObjectHasDependents
        status: "True"
        type: DeletionIsBlocked
      • lastHeartbeatTime: "2024-06-03T06:55:15Z"
        lastTransitionTime: "2024-05-17T02:12:00Z"
        message: Deleting the CephCluster
        reason: ClusterDeleting
        status: "True"
        type: Deleting
        message: Deleting the CephCluster

      2024-05-31 02:35:36.204878 E | ceph-cluster-controller: failed to reconcile CephCluster "openshift-storage/ocs-storagecluster-cephcluster". CephCluster "openshift-storage/ocs-storagecluster-cephcluster" will not be deleted until all dependents are removed: CephBlockPool: [ocs-storagecluster-cephblockpool], CephFilesystem: [ocs-storagecluster-cephfilesystem], CephObjectStore: [ocs-storagecluster-cephobjectstore], CephObjectStoreUser: [noobaa-ceph-objectstore-user ocs-storagecluster-cephobjectstoreuser prometheus-user]

      Version of all relevant components (if applicable):

      ODF 4.12

      ocs-operator.v4.12.11-rhodf OpenShift Container Storage 4.12.11-rhodf ocs-operator.v4.11.13 Succeeded
      odf-csi-addons-operator.v4.12.11-rhodf CSI Addons 4.12.11-rhodf odf-csi-addons-operator.v4.11.13 Succeeded
      odf-multicluster-orchestrator.v4.12.12-rhodf ODF Multicluster Orchestrator 4.12.12-rhodf odf-multicluster-orchestrator.v4.12.11-rhodf Succeeded
      odf-operator.v4.12.11-rhodf OpenShift Data Foundation 4.12.11-rhodf odf-operator.v4.11.13 Succeeded
      odr-hub-operator.v4.12.12-rhodf Openshift DR Hub Operator 4.12.12-rhodf odr-hub-operator.v4.12.11-rhodf Succeeded
      openshift-gitops-operator.v1.11.2 Red Hat OpenShift GitOps 1.11.2 openshift-gitops-operator.v1.11.1 Succeeded

      Does this issue impact your ability to continue to work with the product
      (please explain in detail what is the user impact)?

      Yes

      Is there any workaround available to the best of your knowledge?

      There are two options which we discussed with customer :

      a] First is to take back up of their data and reinstall ODf
      b] Second is to restore the cluster using upstream procedure
      https://www.rook.io/docs/rook/v1.14/Troubleshooting/disaster-recovery/#restoring-crds-after-deletion

      The cluster is being extensively used for Quay and several applications are using ODF based PVCs and OBCs hence customer is not okay with the first option.

      Ask : Can we recover the cluster using the upstream procedure and attempt to restore the cephcluster ?

      Rate from 1 - 5 the complexity of the scenario you performed that caused this
      bug (1 - very simple, 5 - very complex)?
      3

      Can this issue reproducible?
      Yes

      Can this issue reproduce from the UI?
      N/A

      If this is a regression, please provide more details to justify this:
      N/A

      Steps to Reproduce:
      N/A

      Actual results:
      N/A

      Expected results:

      N/A

      Additional info:

              paarora@redhat.com Parth Arora
              rhn-support-smitra Soumi Mitra
              Kevan Hover, Soumi Mitra
              Neha Berry Neha Berry
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated: