Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-10545

[RDR] [Hub recovery] [Co-situated] Data sync for all cephfs workloads gets impacted while running IOs post successful failover and cleanup

XMLWordPrintable

    • True
    • Data sync is impacted, no known workaround at this point of time.
    • False
    • Important
    • No

      Description of problem:

      Version-Release number of selected component (if applicable):

      OCP 4.15.0-0.nightly-2024-03-05-113700
      ACM 2.10.0-DOWNSTREAM-2024-02-28-06-06-55
      ODF 4.15.0-157
      ceph version 17.2.6-196.el9cp (cbbf2cfb549196ca18c0c9caff9124d83ed681a4) quincy (stable)
      Submariner brew.registry.redhat.io/rh-osbs/iib:680159

      How reproducible:

      Steps to Reproduce:

      ****Active hub co-situated with primary managed cluster****

      1. On a Regional DR setup,
      perform site failure (active hub and the primary managed cluster goes down) and moving to passive hub post hub recovery, all the CephFS workloads of both subscription and appset types and in different states Deployed, FailedOver, Relocated which were running on primary managed cluster were failedover to the failovercluster (secondary) and the failover operation was successful.

      Workloads are successfully running on the failovercluster (secondary) and VRG both states are marked as Primary for all these workloads.

      2. Now recover the older primary managed cluster and ensure it's successfully imported on the RHACM console (if not, create auto-import-secret for this cluster on the passive hub).
      3. Monitor drpc cleanup status and lastGroupSyncTime for all the failedover workloads.
      4. After successful cleanup, let IOs continue for a few days and monitor the sync progress, lastGroupSyncTime etc.

      Actual results: [RDR] [Hub recovery] [Co-situated] Data sync for all cephfs workloads gets impacted while running IOs post successful failover and cleanup

      Expected results: Data sync should progress as expected and submariner connectivity issue shouldn't be seen.

      Additional info:

      Slack thread- https://redhat-internal.slack.com/archives/C0134E73VH6/p1710874024678819

        1. subctl diagnose.rtf
          21 kB
          Aman Agrawal
        2. subctl service discovery.txt
          122 kB
          Aman Agrawal
        3. subctl service discovery re-try with context.txt
          7 kB
          Aman Agrawal
        4. subctl verify-post hub recovery.txt
          111 kB
          Aman Agrawal

              tpanteli Thomas Pantelis
              amagrawa@redhat.com Aman Agrawal
              Maxim Babushkin Maxim Babushkin
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: