Uploaded image for project: 'Data Foundation Bugs'
  1. Data Foundation Bugs
  2. DFBUGS-82

[2296264] [MDR] Not able to disable Disaster Recovery for ACM discovered applications after primary is down and Failing over to secondary

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • odf-4.16
    • odf-dr/ramen
    • False
    • Hide

      None

      Show
      None
    • False
    • Committed
    • Committed
    • RamenDR sprint 2024 #18, RamenDR sprint 2024 #19
    • None

      Description of problem (please be detailed as possible and provide log
      snippests):
      Hi ,

      I am trying to do Recovery to replace cluster with MDR of Discovered Apps by following mentioned steps.

      [1]
      https://docs.redhat.com/en/documentation/red_hat_openshift_data_foundation/4.15/html/configuring_openshift_data_foundation_disaster_recovery_for_openshift_workloads/metro-dr-solution#recovering-to-a-replacement-cluster-with-mdr_manage-mdr

      Followed below steps to disable DR of Discovered apps.
      [2]
      https://docs.google.com/document/d/1BoqbEqDBLCQZXp2qvd7Hw5mvg59njv1dqlrH6Hy7L58/edit#heading=h.1yx58g1ouy2

      When I am trying to delete DRPC it get stuck in deleting state.Below is the drpc yaml output.

      ➜ hub oc get drpc imperative-1 -n openshift-dr-ops -oyaml
      apiVersion: ramendr.openshift.io/v1alpha1
      kind: DRPlacementControl
      metadata:
      annotations:
      drplacementcontrol.ramendr.openshift.io/app-namespace: openshift-dr-ops
      drplacementcontrol.ramendr.openshift.io/last-app-deployment-cluster: asagare-sec
      creationTimestamp: "2024-07-04T07:39:18Z"
      deletionGracePeriodSeconds: 0
      deletionTimestamp: "2024-07-08T05:25:48Z"
      finalizers:

      • drpc.ramendr.openshift.io/finalizer
        generation: 6
        labels:
        cluster.open-cluster-management.io/backup: ramen
        name: imperative-1
        namespace: openshift-dr-ops
        ownerReferences:
      • apiVersion: cluster.open-cluster-management.io/v1beta1
        blockOwnerDeletion: true
        controller: true
        kind: Placement
        name: imperative-1-placement-1
        uid: 0e05a1af-f89a-48b1-aa0f-28cb89b14344
        resourceVersion: "12464211"
        uid: 2f372e03-fb46-4ec5-96d2-e517f9f88d09
        spec:
        action: Failover
        drPolicyRef:
        apiVersion: ramendr.openshift.io/v1alpha1
        kind: DRPolicy
        name: odr-policy-mdr
        failoverCluster: asagare-sec
        kubeObjectProtection:
        captureInterval: 2m0s
        kubeObjectSelector:
        matchExpressions:
      • key: appname
        operator: In
        values:
      • busybox
        placementRef:
        apiVersion: cluster.open-cluster-management.io/v1beta1
        kind: Placement
        name: imperative-1-placement-1
        namespace: openshift-dr-ops
        preferredCluster: asagare-pri
        protectedNamespaces:
      • busybox-discovered
        pvcSelector:
        matchExpressions:
      • key: appname
        operator: In
        values:
      • busybox
        status:
        actionStartTime: "2024-07-05T11:28:26Z"
        conditions:
      • lastTransitionTime: "2024-07-05T11:28:27Z"
        message: Completed
        observedGeneration: 5
        reason: FailedOver
        status: "True"
        type: Available
      • lastTransitionTime: "2024-07-05T11:28:26Z"
        message: cleaning secondaries
        observedGeneration: 5
        reason: Cleaning
        status: "False"
        type: PeerReady
      • lastTransitionTime: "2024-07-05T11:29:57Z"
        message: VolumeReplicationGroup (openshift-dr-ops/imperative-1) on cluster asagare-sec
        is reporting errors (Cluster data of one or more PVs are unprotectedVRG Kube
        object protect errorunable to ListKeys in DeleteObjects from endpoint https://s3-openshift-storage.apps.asagare-pri.qe.rh-ocs.com
        bucket odrbucket-84427fcbc7ce keyPrefix openshift-dr-ops/imperative-1/kube-objects/1/velero/backups/)
        protecting workload resources, retrying till ClusterDataProtected condition
        is met
        observedGeneration: 5
        reason: Error
        status: "False"
        type: Protected
        lastKubeObjectProtectionTime: "2024-07-05T11:20:51Z"
        lastUpdateTime: "2024-07-07T19:49:50Z"
        observedGeneration: 6
        phase: Deleting
        preferredDecision:
        clusterName: asagare-pri
        clusterNamespace: asagare-pri
        progression: Deleting
        resourceConditions:
        conditions:
      • lastTransitionTime: "2024-07-05T11:29:52Z"
        message: PVCs in the VolumeReplicationGroup are ready for use
        observedGeneration: 1
        reason: Ready
        status: "True"
        type: DataReady
      • lastTransitionTime: "2024-07-05T11:29:52Z"
        message: VolumeReplicationGroup is replicating
        observedGeneration: 1
        reason: Replicating
        status: "False"
        type: DataProtected
      • lastTransitionTime: "2024-07-05T11:29:16Z"
        message: Restored PVs and PVCs
        observedGeneration: 1
        reason: Restored
        status: "True"
        type: ClusterDataReady
      • lastTransitionTime: "2024-07-05T11:29:52Z"
        message: Cluster data of one or more PVs are unprotectedVRG Kube object protect
        errorunable to ListKeys in DeleteObjects from endpoint https://s3-openshift-storage.apps.asagare-pri.qe.rh-ocs.com
        bucket odrbucket-84427fcbc7ce keyPrefix openshift-dr-ops/imperative-1/kube-objects/1/velero/backups/
        observedGeneration: 1
        reason: UploadError
        status: "False"
        type: ClusterDataProtected
        resourceMeta:
        generation: 1
        kind: VolumeReplicationGroup
        name: imperative-1
        namespace: openshift-dr-ops
        protectedpvcs:
      • busybox-pvc
        resourceVersion: "7885562"

      Version of all relevant components (if applicable):

      OCP: 4.16.0-0.nightly-2024-06-27-091410
      ODF: 4.16.0-134
      ACM: 2.11.0-140
      CEPH: 18.2.1-194.el9cp (04a992766839cd3207877e518a1238cdbac3787e) reef (stable)
      OADP: 1.4.0
      GitOps: 1.12.4

      Does this issue impact your ability to continue to work with the product
      (please explain in detail what is the user impact)?

      Is there any workaround available to the best of your knowledge?

      Rate from 1 - 5 the complexity of the scenario you performed that caused this
      bug (1 - very simple, 5 - very complex)?

      Can this issue reproducible?
      yes

      Can this issue reproduce from the UI?

      If this is a regression, please provide more details to justify this:

      Steps to Reproduce:
      1. I have upgraded MDR setup from 4.15.4 to 4.16
      2. Deployed and applied DRpolicy to Discovered apps on primary cluster 1
      3. Powered off primary cluster.
      4. Followed steps for replace cluster mentioned in doc[1]
      5. Disabled DR for protected apps using doc[2]
      6. Drpc deletion stuck in deleting state.

      Actual results:
      drpc deletion stuck in deleting state.

      Expected results:
      drpc should get deleted.

      Additional info:

              rtalur@redhat.com Raghavendra Talur
              rh-ee-asagare Avdhoot Sagare
              Avdhoot Sagare Avdhoot Sagare
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated: