Description of problem - Provide a detailed description of the issue encountered, including logs/command-output snippets and screenshots if the issue is observed in the UI:
While performing failover after upgrade from 4.17.4 to 4.18.0-133 the DRPC struck in
WaitForStorageMaintenanceActivation
[root@rdr-hub-418-bastion-0 ~]# oc get drpc -A -o wide NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY openshift-gitops app-set-busy-box-placement-drpc 28h rdr-primary-418 rdr-secondary-418 Failover FailingOver WaitForStorageMaintenanceActivation 2025-02-21T08:31:29Z False [root@rdr-hub-418-bastion-0 ~]# oc get drpc -A -o yaml apiVersion: v1 items: - apiVersion: ramendr.openshift.io/v1alpha1 kind: DRPlacementControl metadata: annotations: drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workload drplacementcontrol.ramendr.openshift.io/last-app-deployment-cluster: rdr-primary-418 creationTimestamp: "2025-02-20T11:15:42Z" finalizers: - drpc.ramendr.openshift.io/finalizer generation: 2 labels: cluster.open-cluster-management.io/backup: ramen name: app-set-busy-box-placement-drpc namespace: openshift-gitops ownerReferences: - apiVersion: cluster.open-cluster-management.io/v1beta1 blockOwnerDeletion: true controller: true kind: Placement name: app-set-busy-box-placement uid: a1658dc8-9657-4990-ae0b-d3d25d5265ce resourceVersion: "1352502" uid: 3736e8d1-7dd1-45ac-847f-160b2d7180b9 spec: action: Failover drPolicyRef: apiVersion: ramendr.openshift.io/v1alpha1 kind: DRPolicy name: drpolicy-5m failoverCluster: rdr-secondary-418 placementRef: apiVersion: cluster.open-cluster-management.io/v1beta1 kind: Placement name: app-set-busy-box-placement namespace: openshift-gitops preferredCluster: rdr-primary-418 pvcSelector: matchExpressions: - key: appname operator: In values: - busybox_app1 status: actionStartTime: "2025-02-21T08:31:29Z" conditions: - lastTransitionTime: "2025-02-21T08:31:29Z" message: Waiting for spec.failoverCluster to meet failover prerequsites observedGeneration: 2 reason: FailingOver status: "False" type: Available - lastTransitionTime: "2025-02-21T08:31:29Z" message: Started failover to cluster "rdr-secondary-418" observedGeneration: 2 reason: NotStarted status: "False" type: PeerReady - lastTransitionTime: "2025-02-21T15:31:28Z" message: VolumeReplicationGroup (busybox-workload/app-set-busy-box-placement-drpc) on cluster rdr-primary-418 is protecting required resources and data observedGeneration: 2 reason: Protected status: "True" type: Protected lastGroupSyncBytes: 19288064 lastGroupSyncDuration: 1s lastGroupSyncTime: "2025-02-21T15:35:00Z" lastUpdateTime: "2025-02-21T15:35:58Z" observedGeneration: 2 phase: FailingOver preferredDecision: clusterName: rdr-primary-418 clusterNamespace: rdr-primary-418 progression: WaitForStorageMaintenanceActivation resourceConditions: conditions: - lastTransitionTime: "2025-02-20T11:16:04Z" message: PVCs in the VolumeReplicationGroup are ready for use observedGeneration: 1 reason: Ready status: "True" type: DataReady - lastTransitionTime: "2025-02-20T11:15:56Z" message: VolumeReplicationGroup is replicating observedGeneration: 1 reason: Replicating status: "False" type: DataProtected - lastTransitionTime: "2025-02-20T11:15:43Z" message: Nothing to restore observedGeneration: 1 reason: Restored status: "True" type: ClusterDataReady - lastTransitionTime: "2025-02-21T15:31:15Z" message: Cluster data of all PVs are protected. VRG object protected observedGeneration: 1 reason: Uploaded status: "True" type: ClusterDataProtected - lastTransitionTime: "2025-02-21T04:58:05Z" message: Kube objects restored observedGeneration: 1 reason: KubeObjectsRestored status: "True" type: KubeObjectsReady resourceMeta: generation: 1 kind: VolumeReplicationGroup name: app-set-busy-box-placement-drpc namespace: busybox-workload protectedpvcs: - busybox-pvc-7 - busybox-pvc-1 - busybox-pvc-4 - busybox-pvc-8 - busybox-pvc-3 - busybox-pvc-9 - busybox-pvc-5 - busybox-pvc-10 - busybox-pvc-2 - busybox-pvc-6 resourceVersion: "1359444" kind: List metadata: resourceVersion: ""
[root@rdr-secondary-418-bastion-0 ~]# oc get maintenancemodes.ramendr.openshift.io -A NAME AGE 084d0f46538fd05587d3acd168ada3d8 7h6m [root@rdr-secondary-418-bastion-0 ~]# oc describe maintenancemodes.ramendr.openshift.io -A Name: 084d0f46538fd05587d3acd168ada3d8 Namespace: Labels: <none> Annotations: <none> API Version: ramendr.openshift.io/v1alpha1 Kind: MaintenanceMode Metadata: Creation Timestamp: 2025-02-21T08:31:29Z Generation: 1 Owner References: API Version: work.open-cluster-management.io/v1 Kind: AppliedManifestWork Name: b67375b7bc8ec56f8678e8a198ad538fa4d0c1f9a28e65611853fcb1500d3aed-084d0f46538fd05587d3acd168ada3d8-mmode-mw UID: 471d2818-d7e8-4207-9ea4-12138a8bc304 Resource Version: 1049052 UID: c794add9-4171-4d8b-bf6e-825304763d78 Spec: Modes: Failover Storage Provisioner: openshift-storage.rbd.csi.ceph.com Target ID: 084d0f46538fd05587d3acd168ada3d8 Events: <none>
The OCP platform infrastructure and deployment type (AWS, Bare Metal, VMware, etc. Please clarify if it is platform agnostic deployment), (IPI/UPI):
IBM Power
The ODF deployment type (Internal, External, Internal-Attached (LSO), Multicluster, DR, Provider, etc):
RDR
The version of all relevant components (OCP, ODF, RHCS, ACM whichever is applicable):
Before upgrade ODF 4.17.4
After Upgrade: ODF: 4.18.0-133
ACM v2.13.0-52
MCE v2.8.0-49
Submariner 0.19
Volsync v0.11.1
OADP v1.4.2
Gitops 1.15.0
Does this issue impact your ability to continue to work with the product?
Yes
Is there any workaround available to the best of your knowledge?
No
Can this issue be reproduced? If so, please provide the hit rate
Yes 100%
Can this issue be reproduced from the UI?
No
If this is a regression, please provide more details to justify this:
Yes
Steps to Reproduce:
1.Create RDR setup with 4.17.4 ODF version on Power Env
2.Create a sample application(appset pull based) ocs-workloads/rdr/busybox/rbd/workloads/app-busybox-1 at master · red-hat-storage/ocs-workloads
3. Attach DR policy to it
4. Perform ODF and MCO upgrade from 4.17.4 to 4.18.0-133
5. After successful upgrade try to perform failover
6. Failover struck in WaitForStorageMaintenanceActivation
The exact date and time when the issue was observed, including timezone details:
9:22 pm
Friday, 21 February 2025
Indian Standard Time (IST)
Actual results:
Failover is struck on WaitForStorageMaintenanceActivation
Expected results:
Failover should complete successfully
Logs collected and log location:
Must gather Link:
https://drive.google.com/file/d/1noC7cFV_sYjlBxiKHX-AV6HuqlYxnagj/view?usp=drive_link
Additional info:
- links to
-
RHBA-2024:138027 Red Hat OpenShift Data Foundation 4.18 security, enhancement & bug fix update