Loading...

Type: Bug
Resolution: Done-Errata
Priority: Major
Fix Version/s: odf-4.18
Affects Version/s: odf-4.18
Component/s: odf-dr/multicluster-orchestrator
Labels:
- IBM-Power
- RDR

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Dev Approval:
Committed
Docs Approval:
?
Architecture:

ppc64le
PM Approval:
?
Prod build version:
4.18.0-134
QE Approval:
Committed
Release Note Type:
Release Note Not Required
Target Release:

odf-4.18
Intelligence Requested:
Market:

Target Version:

odf-4.18

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem - Provide a detailed description of the issue encountered, including logs/command-output snippets and screenshots if the issue is observed in the UI:

While performing failover after upgrade from 4.17.4 to 4.18.0-133 the DRPC struck in

WaitForStorageMaintenanceActivation

[root@rdr-hub-418-bastion-0 ~]# oc get drpc -A -o wide
NAMESPACE          NAME                              AGE   PREFERREDCLUSTER   FAILOVERCLUSTER     DESIREDSTATE   CURRENTSTATE   PROGRESSION                           START TIME             DURATION   PEER READY
openshift-gitops   app-set-busy-box-placement-drpc   28h   rdr-primary-418    rdr-secondary-418   Failover       FailingOver    WaitForStorageMaintenanceActivation   2025-02-21T08:31:29Z              False
[root@rdr-hub-418-bastion-0 ~]# oc get drpc -A -o yaml
apiVersion: v1
items:
- apiVersion: ramendr.openshift.io/v1alpha1
  kind: DRPlacementControl
  metadata:
    annotations:
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workload
      drplacementcontrol.ramendr.openshift.io/last-app-deployment-cluster: rdr-primary-418
    creationTimestamp: "2025-02-20T11:15:42Z"
    finalizers:
    - drpc.ramendr.openshift.io/finalizer
    generation: 2
    labels:
      cluster.open-cluster-management.io/backup: ramen
    name: app-set-busy-box-placement-drpc
    namespace: openshift-gitops
    ownerReferences:
    - apiVersion: cluster.open-cluster-management.io/v1beta1
      blockOwnerDeletion: true
      controller: true
      kind: Placement
      name: app-set-busy-box-placement
      uid: a1658dc8-9657-4990-ae0b-d3d25d5265ce
    resourceVersion: "1352502"
    uid: 3736e8d1-7dd1-45ac-847f-160b2d7180b9
  spec:
    action: Failover
    drPolicyRef:
      apiVersion: ramendr.openshift.io/v1alpha1
      kind: DRPolicy
      name: drpolicy-5m
    failoverCluster: rdr-secondary-418
    placementRef:
      apiVersion: cluster.open-cluster-management.io/v1beta1
      kind: Placement
      name: app-set-busy-box-placement
      namespace: openshift-gitops
    preferredCluster: rdr-primary-418
    pvcSelector:
      matchExpressions:
      - key: appname
        operator: In
        values:
        - busybox_app1
  status:
    actionStartTime: "2025-02-21T08:31:29Z"
    conditions:
    - lastTransitionTime: "2025-02-21T08:31:29Z"
      message: Waiting for spec.failoverCluster to meet failover prerequsites
      observedGeneration: 2
      reason: FailingOver
      status: "False"
      type: Available
    - lastTransitionTime: "2025-02-21T08:31:29Z"
      message: Started failover to cluster "rdr-secondary-418"
      observedGeneration: 2
      reason: NotStarted
      status: "False"
      type: PeerReady
    - lastTransitionTime: "2025-02-21T15:31:28Z"
      message: VolumeReplicationGroup (busybox-workload/app-set-busy-box-placement-drpc)
        on cluster rdr-primary-418 is protecting required resources and data
      observedGeneration: 2
      reason: Protected
      status: "True"
      type: Protected
    lastGroupSyncBytes: 19288064
    lastGroupSyncDuration: 1s
    lastGroupSyncTime: "2025-02-21T15:35:00Z"
    lastUpdateTime: "2025-02-21T15:35:58Z"
    observedGeneration: 2
    phase: FailingOver
    preferredDecision:
      clusterName: rdr-primary-418
      clusterNamespace: rdr-primary-418
    progression: WaitForStorageMaintenanceActivation
    resourceConditions:
      conditions:
      - lastTransitionTime: "2025-02-20T11:16:04Z"
        message: PVCs in the VolumeReplicationGroup are ready for use
        observedGeneration: 1
        reason: Ready
        status: "True"
        type: DataReady
      - lastTransitionTime: "2025-02-20T11:15:56Z"
        message: VolumeReplicationGroup is replicating
        observedGeneration: 1
        reason: Replicating
        status: "False"
        type: DataProtected
      - lastTransitionTime: "2025-02-20T11:15:43Z"
        message: Nothing to restore
        observedGeneration: 1
        reason: Restored
        status: "True"
        type: ClusterDataReady
      - lastTransitionTime: "2025-02-21T15:31:15Z"
        message: Cluster data of all PVs are protected. VRG object protected
        observedGeneration: 1
        reason: Uploaded
        status: "True"
        type: ClusterDataProtected
      - lastTransitionTime: "2025-02-21T04:58:05Z"
        message: Kube objects restored
        observedGeneration: 1
        reason: KubeObjectsRestored
        status: "True"
        type: KubeObjectsReady
      resourceMeta:
        generation: 1
        kind: VolumeReplicationGroup
        name: app-set-busy-box-placement-drpc
        namespace: busybox-workload
        protectedpvcs:
        - busybox-pvc-7
        - busybox-pvc-1
        - busybox-pvc-4
        - busybox-pvc-8
        - busybox-pvc-3
        - busybox-pvc-9
        - busybox-pvc-5
        - busybox-pvc-10
        - busybox-pvc-2
        - busybox-pvc-6
        resourceVersion: "1359444"
kind: List
metadata:
  resourceVersion: ""

[root@rdr-secondary-418-bastion-0 ~]# oc get maintenancemodes.ramendr.openshift.io -A NAME                               AGE 084d0f46538fd05587d3acd168ada3d8   7h6m [root@rdr-secondary-418-bastion-0 ~]# oc describe maintenancemodes.ramendr.openshift.io -A Name:         084d0f46538fd05587d3acd168ada3d8 Namespace: Labels:       <none> Annotations:  <none> API Version:  ramendr.openshift.io/v1alpha1 Kind:         MaintenanceMode Metadata:   Creation Timestamp:  2025-02-21T08:31:29Z   Generation:          1   Owner References:     API Version:     work.open-cluster-management.io/v1     Kind:            AppliedManifestWork     Name:            b67375b7bc8ec56f8678e8a198ad538fa4d0c1f9a28e65611853fcb1500d3aed-084d0f46538fd05587d3acd168ada3d8-mmode-mw     UID:             471d2818-d7e8-4207-9ea4-12138a8bc304   Resource Version:  1049052   UID:               c794add9-4171-4d8b-bf6e-825304763d78 Spec:   Modes:     Failover   Storage Provisioner:  openshift-storage.rbd.csi.ceph.com   Target ID:            084d0f46538fd05587d3acd168ada3d8 Events:                 <none>

The OCP platform infrastructure and deployment type (AWS, Bare Metal, VMware, etc. Please clarify if it is platform agnostic deployment), (IPI/UPI):

IBM Power

The ODF deployment type (Internal, External, Internal-Attached (LSO), Multicluster, DR, Provider, etc):

RDR

The version of all relevant components (OCP, ODF, RHCS, ACM whichever is applicable):

Before upgrade ODF 4.17.4

After Upgrade: ODF: 4.18.0-133

ACM v2.13.0-52

MCE v2.8.0-49

Submariner 0.19

Volsync v0.11.1

OADP v1.4.2

Gitops 1.15.0

Does this issue impact your ability to continue to work with the product?

Yes

Is there any workaround available to the best of your knowledge?

No

Can this issue be reproduced? If so, please provide the hit rate

Yes 100%

Can this issue be reproduced from the UI?

No

If this is a regression, please provide more details to justify this:

Yes

Steps to Reproduce:

1.Create RDR setup with 4.17.4 ODF version on Power Env

2.Create a sample application(appset pull based) ocs-workloads/rdr/busybox/rbd/workloads/app-busybox-1 at master · red-hat-storage/ocs-workloads

3. Attach DR policy to it

4. Perform ODF and MCO upgrade from 4.17.4 to 4.18.0-133

5. After successful upgrade try to perform failover

6. Failover struck in WaitForStorageMaintenanceActivation

The exact date and time when the issue was observed, including timezone details:
9:22 pm
Friday, 21 February 2025
Indian Standard Time (IST)

Actual results:

Failover is struck on WaitForStorageMaintenanceActivation

Expected results:

Failover should complete successfully

Logs collected and log location:

Must gather Link:

https://drive.google.com/file/d/1noC7cFV_sYjlBxiKHX-AV6HuqlYxnagj/view?usp=drive_link

Additional info:

links to

[main] Fix watches on reconcilers #270

red-hat-storage/odf-multicluster-orchestrator#271: DFBUGS-1662: [release-4.18] Fix watches on reconcilers

RHBA-2024:138027 Red Hat OpenShift Data Foundation 4.18 security, enhancement & bug fix update

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty