Uploaded image for project: 'Data Foundation Bugs'
  1. Data Foundation Bugs
  2. DFBUGS-575

[2266154] [RDR] Data replication stopped for most of the workloads

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • odf-4.18
    • odf-4.15
    • ceph/RADOS/x86
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • ?
    • If docs needed, set a value
    • None

      Description of problem (please be detailed as possible and provide log
      snippests):

      Version of all relevant components (if applicable):
      OCP 4.15.0-0.nightly-2024-02-16-235514
      ODF v4.15.0-149.stable
      ACM 2.10.0-DOWNSTREAM-2024-02-15-05-34-13
      Submariner 0.17.0
      ceph version 17.2.6-196.el9cp (cbbf2cfb549196ca18c0c9caff9124d83ed681a4) quincy (stable)

      Does this issue impact your ability to continue to work with the product
      (please explain in detail what is the user impact)?

      Is there any workaround available to the best of your knowledge?

      Rate from 1 - 5 the complexity of the scenario you performed that caused this
      bug (1 - very simple, 5 - very complex)?

      Can this issue reproducible?

      Can this issue reproduce from the UI?

      If this is a regression, please provide more details to justify this:

      Steps to Reproduce:
      1. Configured a Regional DR setup with DR protected workloads of types subscription and appset backed by both RBD and CephFS in all combinations. Ran IOs for 3-4 days and found that replication stopped for most of the workloads.

      No failover/relocate action was performed on them.

      Subctl verify connectivity check passed w.r.t both the managed clusters.

      2.
      3.

      Actual results: Data replication stopped for most of the workloads

      amagrawa:~$ drpc
      NAMESPACE NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY
      busybox-workloads-13 cephfs-sub-busybox13-placement-1-drpc 4d23h amagrawa-m1 Deployed Completed 2024-02-21T20:07:19Z 48.129206831s True
      busybox-workloads-14 cephfs-sub-busybox14-placement-1-drpc 4d23h amagrawa-m1 Deployed Completed 2024-02-21T20:08:50Z 32.138146869s True
      busybox-workloads-15 cephfs-sub-busybox15-placement-1-drpc 4d23h amagrawa-m1 Deployed Completed 2024-02-21T20:09:59Z 49.128696954s True
      busybox-workloads-16 cephfs-sub-busybox16-placement-1-drpc 4d23h amagrawa-m2 Deployed Completed 2024-02-21T20:11:02Z 45.122431672s True
      busybox-workloads-5 rbd-sub-busybox5-placement-1-drpc 5d amagrawa-m1 Deployed Completed 2024-02-21T19:55:17Z 15.08242303s True
      busybox-workloads-6 rbd-sub-busybox6-placement-1-drpc 5d amagrawa-m1 Deployed Completed 2024-02-21T19:56:46Z 2.073870577s True
      busybox-workloads-7 rbd-sub-busybox7-placement-1-drpc 5d amagrawa-m1 Deployed Completed 2024-02-21T19:57:44Z 15.036975914s True
      busybox-workloads-8 rbd-sub-busybox8-placement-1-drpc 4d23h amagrawa-m2 Deployed Completed 2024-02-21T19:58:37Z 23.038867468s True
      openshift-gitops cephfs-appset-busybox10-placement-drpc 4d23h amagrawa-m1 Deployed Completed 2024-02-21T20:03:32Z 46.149574197s True
      openshift-gitops cephfs-appset-busybox11-placement-drpc 4d23h amagrawa-m1 Deployed Completed 2024-02-21T20:04:58Z 32.153456796s True
      openshift-gitops cephfs-appset-busybox12-placement-drpc 4d23h amagrawa-m2 Deployed Completed 2024-02-21T20:05:59Z 35.134274567s True
      openshift-gitops cephfs-appset-busybox9-placement-drpc 4d23h amagrawa-m1 Deployed Completed 2024-02-21T20:02:33Z 31.288294384s True
      openshift-gitops rbd-appset-busybox1-placement-drpc 5d amagrawa-m1 Deployed Completed 2024-02-21T19:38:58Z 8m55.725779771s True
      openshift-gitops rbd-appset-busybox2-placement-drpc 5d amagrawa-m1 Deployed Completed 2024-02-21T19:43:44Z 4m22.363511628s True
      openshift-gitops rbd-appset-busybox3-placement-drpc 4d23h amagrawa-m1 Deployed Completed 2024-02-21T19:59:44Z 21.04601057s True
      openshift-gitops rbd-appset-busybox4-placement-drpc 5d amagrawa-m2 Deployed Completed 2024-02-21T19:53:53Z 16.039184856s True

      amagrawa:~$ group
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-13
      namespace: busybox-workloads-13
      namespace: busybox-workloads-13
      lastGroupSyncTime: "2024-02-24T14:32:45Z"
      namespace: busybox-workloads-13
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-14
      namespace: busybox-workloads-14
      namespace: busybox-workloads-14
      lastGroupSyncTime: "2024-02-24T14:32:07Z"
      namespace: busybox-workloads-14
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-15
      namespace: busybox-workloads-15
      namespace: busybox-workloads-15
      lastGroupSyncTime: "2024-02-24T14:32:20Z"
      namespace: busybox-workloads-15
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-16
      namespace: busybox-workloads-16
      namespace: busybox-workloads-16
      lastGroupSyncTime: "2024-02-26T14:02:07Z"
      namespace: busybox-workloads-16
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-5
      namespace: busybox-workloads-5
      namespace: busybox-workloads-5
      lastGroupSyncTime: "2024-02-25T16:15:05Z"
      namespace: busybox-workloads-5
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-6
      namespace: busybox-workloads-6
      namespace: busybox-workloads-6
      lastGroupSyncTime: "2024-02-25T16:15:01Z"
      namespace: busybox-workloads-6
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-7
      namespace: busybox-workloads-7
      namespace: busybox-workloads-7
      lastGroupSyncTime: "2024-02-25T16:15:01Z"
      namespace: busybox-workloads-7
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-8
      namespace: busybox-workloads-8
      namespace: busybox-workloads-8
      lastGroupSyncTime: "2024-02-26T19:55:00Z"
      namespace: busybox-workloads-8
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-10
      namespace: openshift-gitops
      namespace: openshift-gitops
      lastGroupSyncTime: "2024-02-24T14:32:57Z"
      namespace: busybox-workloads-10
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-11
      namespace: openshift-gitops
      namespace: openshift-gitops
      lastGroupSyncTime: "2024-02-24T14:32:37Z"
      namespace: busybox-workloads-11
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-12
      namespace: openshift-gitops
      namespace: openshift-gitops
      lastGroupSyncTime: "2024-02-24T14:25:53Z"
      namespace: busybox-workloads-12
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-9
      namespace: openshift-gitops
      namespace: openshift-gitops
      lastGroupSyncTime: "2024-02-24T14:32:43Z"
      namespace: busybox-workloads-9
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-1
      namespace: openshift-gitops
      namespace: openshift-gitops
      lastGroupSyncTime: "2024-02-25T16:15:03Z"
      namespace: busybox-workloads-1
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-2
      namespace: openshift-gitops
      namespace: openshift-gitops
      lastGroupSyncTime: "2024-02-25T16:15:01Z"
      namespace: busybox-workloads-2
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-3
      namespace: openshift-gitops
      namespace: openshift-gitops
      lastGroupSyncTime: "2024-02-25T16:15:03Z"
      namespace: busybox-workloads-3
      drplacementcontrol.ramendr.openshift.io/app-namespace: busybox-workloads-4
      namespace: openshift-gitops
      namespace: openshift-gitops
      lastGroupSyncTime: "2024-02-26T19:55:00Z"
      namespace: busybox-workloads-4

      amagrawa:~$ date -u
      Monday 26 February 2024 07:58:45 PM UTC

      If we look at the lastGroupSyncTime, it is lagging by 1 or 2 days for most of them but not all.

      Logs- http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-aman/26feb24/

      Expected results: Data replication should work fine while IOs are being run continuously on a RDR setup.

      Additional info:

              rhn-support-bhubbard Brad Hubbard
              amagrawa@redhat.com Aman Agrawal
              Elad Ben Aharon Elad Ben Aharon
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated: