Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-5779

Submariner connectivity is lost when master nodes which were down are brought up on one of the managed clusters of RDR setup after failover

XMLWordPrintable

    • Submariner Sprint 2023-9, Submariner Sprint 2023-10, Submariner Sprint 2023-11, Submariner Sprint 2023-12, Submariner Sprint 2023-13
    • No

      Description of problem: On a Regional DR setup, master nodes on C1 managed cluster were down and failover was performed on DR protected workloads running on C1 managed cluster to C2 managed cluster from ACM UI of Hub cluster, then C1 master nodes were brought up after almost 12hrs. And we see that the subarimer connectivity is lost. Connection status for both the managed cluster is degraded.

      Please note that this is a fresh cluster where submariner has been installed only once by first creating the catalog source on CLI, and then installing it via ACM UI so that it points to the correct catalogsource (they way we have been doing for quite some time now.)

      Version-Release number of selected component (if applicable):

      ACM 2.8.0-DOWNSTREAM-2023-05-30-03-43-50
      OCP 4.13.0-0.nightly-2023-05-25-001936
      ODF 4.13.0-207.stable
      Submariner 0.15

      How reproducible:

      Steps to Reproduce:

      1. On a RDR setup, run app-set based DR protected workload appset-busybox-3 for ~24hrs on C1.
      2. Do all pre-checks, ensure submariner is healthy
      3. Bring all master nodes of primary down (C1 in this case)
      4. Wait for 5-10mins
      5. Inititate failover of worklod appset-busybox-3 to C2 from ACM UI of hub
      6. Let the failover complete
      7. Bring C1 master nodes up after 10-12 hrs, wait for cleanup to complete. Cleanup remains stuck as submariner connectivity is lost.

      Actual results: Submariner connectivity is lost when master nodes which were down are brought up on one of the managed clusters of RDR setup after failover.

      Expected results: Submariner connectivity shouldn't be lost when master nodes which were down are brought up after failover operation on DR protected workloads.

      Additional info:

      Relevant thread- https://redhat-internal.slack.com/archives/C0134E73VH6/p1685713251676249

      Subctl gather logs are placed here- http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-aman/02jun23/

            skitt@redhat.com Stephen Kitt
            amagrawa@redhat.com Aman Agrawal
            Aman Agrawal Aman Agrawal
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: