Uploaded image for project: 'Data Foundation Bugs'
  1. Data Foundation Bugs
  2. DFBUGS-1662

Failover is struck in WaitForStorageMaintenanceActivation

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • Committed
    • ?
    • ppc64le
    • ?
    • 4.18.0-134
    • Committed
    • Release Note Not Required
    • None

       

      Description of problem - Provide a detailed description of the issue encountered, including logs/command-output snippets and screenshots if the issue is observed in the UI:

      While performing failover after upgrade from 4.17.4 to 4.18.0-133 the DRPC struck in 

      WaitForStorageMaintenanceActivation

      [root@rdr-hub-418-bastion-0 ~]# oc get drpc -A -o wide
      NAMESPACE          NAME                              AGE   PREFERREDCLUSTER   FAILOVERCLUSTER     DESIREDSTATE   CURRENTSTATE   PROGRESSION                           START TIME             DURATION   PEER READY
      openshift-gitops   app-set-busy-box-placement-drpc   28h   rdr-primary-418    rdr-secondary-418   Failover       FailingOver    WaitForStorageMaintenanceActivation   2025-02-21T08:31:29Z              False
      [root@rdr-hub-418-bastion-0 ~]# oc get drpc -A -o yaml
      apiVersion: v1
      items:
      - apiVersion: ramendr.openshift.io/v1alpha1
        kind: DRPlacementControl
        metadata:
          annotations:
            drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workload
            drplacementcontrol.ramendr.openshift.io/last-app-deployment-cluster: rdr-primary-418
          creationTimestamp: "2025-02-20T11:15:42Z"
          finalizers:
          - drpc.ramendr.openshift.io/finalizer
          generation: 2
          labels:
            cluster.open-cluster-management.io/backup: ramen
          name: app-set-busy-box-placement-drpc
          namespace: openshift-gitops
          ownerReferences:
          - apiVersion: cluster.open-cluster-management.io/v1beta1
            blockOwnerDeletion: true
            controller: true
            kind: Placement
            name: app-set-busy-box-placement
            uid: a1658dc8-9657-4990-ae0b-d3d25d5265ce
          resourceVersion: "1352502"
          uid: 3736e8d1-7dd1-45ac-847f-160b2d7180b9
        spec:
          action: Failover
          drPolicyRef:
            apiVersion: ramendr.openshift.io/v1alpha1
            kind: DRPolicy
            name: drpolicy-5m
          failoverCluster: rdr-secondary-418
          placementRef:
            apiVersion: cluster.open-cluster-management.io/v1beta1
            kind: Placement
            name: app-set-busy-box-placement
            namespace: openshift-gitops
          preferredCluster: rdr-primary-418
          pvcSelector:
            matchExpressions:
            - key: appname
              operator: In
              values:
              - busybox_app1
        status:
          actionStartTime: "2025-02-21T08:31:29Z"
          conditions:
          - lastTransitionTime: "2025-02-21T08:31:29Z"
            message: Waiting for spec.failoverCluster to meet failover prerequsites
            observedGeneration: 2
            reason: FailingOver
            status: "False"
            type: Available
          - lastTransitionTime: "2025-02-21T08:31:29Z"
            message: Started failover to cluster "rdr-secondary-418"
            observedGeneration: 2
            reason: NotStarted
            status: "False"
            type: PeerReady
          - lastTransitionTime: "2025-02-21T15:31:28Z"
            message: VolumeReplicationGroup (busybox-workload/app-set-busy-box-placement-drpc)
              on cluster rdr-primary-418 is protecting required resources and data
            observedGeneration: 2
            reason: Protected
            status: "True"
            type: Protected
          lastGroupSyncBytes: 19288064
          lastGroupSyncDuration: 1s
          lastGroupSyncTime: "2025-02-21T15:35:00Z"
          lastUpdateTime: "2025-02-21T15:35:58Z"
          observedGeneration: 2
          phase: FailingOver
          preferredDecision:
            clusterName: rdr-primary-418
            clusterNamespace: rdr-primary-418
          progression: WaitForStorageMaintenanceActivation
          resourceConditions:
            conditions:
            - lastTransitionTime: "2025-02-20T11:16:04Z"
              message: PVCs in the VolumeReplicationGroup are ready for use
              observedGeneration: 1
              reason: Ready
              status: "True"
              type: DataReady
            - lastTransitionTime: "2025-02-20T11:15:56Z"
              message: VolumeReplicationGroup is replicating
              observedGeneration: 1
              reason: Replicating
              status: "False"
              type: DataProtected
            - lastTransitionTime: "2025-02-20T11:15:43Z"
              message: Nothing to restore
              observedGeneration: 1
              reason: Restored
              status: "True"
              type: ClusterDataReady
            - lastTransitionTime: "2025-02-21T15:31:15Z"
              message: Cluster data of all PVs are protected. VRG object protected
              observedGeneration: 1
              reason: Uploaded
              status: "True"
              type: ClusterDataProtected
            - lastTransitionTime: "2025-02-21T04:58:05Z"
              message: Kube objects restored
              observedGeneration: 1
              reason: KubeObjectsRestored
              status: "True"
              type: KubeObjectsReady
            resourceMeta:
              generation: 1
              kind: VolumeReplicationGroup
              name: app-set-busy-box-placement-drpc
              namespace: busybox-workload
              protectedpvcs:
              - busybox-pvc-7
              - busybox-pvc-1
              - busybox-pvc-4
              - busybox-pvc-8
              - busybox-pvc-3
              - busybox-pvc-9
              - busybox-pvc-5
              - busybox-pvc-10
              - busybox-pvc-2
              - busybox-pvc-6
              resourceVersion: "1359444"
      kind: List
      metadata:
        resourceVersion: ""

       

      [root@rdr-secondary-418-bastion-0 ~]# oc get maintenancemodes.ramendr.openshift.io -A NAME                               AGE 084d0f46538fd05587d3acd168ada3d8   7h6m [root@rdr-secondary-418-bastion-0 ~]# oc describe maintenancemodes.ramendr.openshift.io -A Name:         084d0f46538fd05587d3acd168ada3d8 Namespace: Labels:       <none> Annotations:  <none> API Version:  ramendr.openshift.io/v1alpha1 Kind:         MaintenanceMode Metadata:   Creation Timestamp:  2025-02-21T08:31:29Z   Generation:          1   Owner References:     API Version:     work.open-cluster-management.io/v1     Kind:            AppliedManifestWork     Name:            b67375b7bc8ec56f8678e8a198ad538fa4d0c1f9a28e65611853fcb1500d3aed-084d0f46538fd05587d3acd168ada3d8-mmode-mw     UID:             471d2818-d7e8-4207-9ea4-12138a8bc304   Resource Version:  1049052   UID:               c794add9-4171-4d8b-bf6e-825304763d78 Spec:   Modes:     Failover   Storage Provisioner:  openshift-storage.rbd.csi.ceph.com   Target ID:            084d0f46538fd05587d3acd168ada3d8 Events:                 <none>

       

      The OCP platform infrastructure and deployment type (AWS, Bare Metal, VMware, etc. Please clarify if it is platform agnostic deployment), (IPI/UPI):

      IBM Power

      The ODF deployment type (Internal, External, Internal-Attached (LSO), Multicluster, DR, Provider, etc):

      RDR

       

      The version of all relevant components (OCP, ODF, RHCS, ACM whichever is applicable):

      Before upgrade ODF 4.17.4

      After Upgrade: ODF: 4.18.0-133

      ACM v2.13.0-52

      MCE v2.8.0-49

      Submariner 0.19

      Volsync  v0.11.1

      OADP v1.4.2

      Gitops 1.15.0

       

       

      Does this issue impact your ability to continue to work with the product?

      Yes

       

      Is there any workaround available to the best of your knowledge?

      No

       

      Can this issue be reproduced? If so, please provide the hit rate

      Yes 100%

       

      Can this issue be reproduced from the UI?

      No

      If this is a regression, please provide more details to justify this:

      Yes

      Steps to Reproduce:

      1.Create RDR setup with 4.17.4 ODF version on Power Env

      2.Create a sample application(appset pull based) ocs-workloads/rdr/busybox/rbd/workloads/app-busybox-1 at master · red-hat-storage/ocs-workloads

      3. Attach DR policy to it

      4. Perform ODF and MCO upgrade from 4.17.4 to 4.18.0-133

      5. After successful upgrade try to perform failover

      6. Failover struck in WaitForStorageMaintenanceActivation

       

      The exact date and time when the issue was observed, including timezone details:
      9:22 pm
      Friday, 21 February 2025
      Indian Standard Time (IST)

      Actual results:

      Failover is struck on WaitForStorageMaintenanceActivation

       

      Expected results:

      Failover should complete successfully

      Logs collected and log location:

      Must gather Link:

      https://drive.google.com/file/d/1noC7cFV_sYjlBxiKHX-AV6HuqlYxnagj/view?usp=drive_link

      Additional info:

       
       

              rhn-support-uchapaga Umanga Chapagain
              rh-ee-shdas Shilpi Das
              Shilpi Das Shilpi Das
              Votes:
              0 Vote for this issue
              Watchers:
              23 Start watching this issue

                Created:
                Updated:
                Resolved: