Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-6012

Submariner connection doesn't recover after all cluster nodes are powered off and powered on

XMLWordPrintable

    • Submariner Sprint 2023-8
    • No

      Description of problem:

      On a Regional DR setup (Submarnier with globalnet enabled),
      while running one of the Regional DR failover test where all nodes of 1st managed cluster are powered off and later powered On, it was observed that the submariner connection goes into Degraded state and never recovers even though all nodes are back online & pods in running state. 

      Version-Release number of selected component (if applicable):

      ACM: 2.8.0-DOWNSTREAM-2023-05-19-14-13-56
      Submariner: 0.15.0 (image: brew.registry.redhat.io/rh-osbs/iib:500162)
      ODF: 4.13.0-203
      OCP: 4.13.0-0.nightly-2023-05-22-040653

      How reproducible:

      2/2 on Globalnet enabled cluster

      Steps to Reproduce:

      1. Configure RDR setup consisting of 3 OCP clusters (Hub, C1, C2)
      2. Deploy multiple workload/applications (In my case, it was 5 applications with total 85 PVCs/Pods on C1)
      3. Run IOs for ~10-20 minutes
      4. Power off the C1 cluster node 
      5. Failover the workload/applications to C2
      6. Power On the C1 cluster nodes
      7. Check submariner status in ACM console

      Actual results:

      Submariner Connection status remains Degraded with below error message 
      The connection between clusters "sagrawal-nc1" and "sagrawal-nc2" is not established (status=error)

      Expected results:

      Connection status should recover once all cluster nodes are recovered 

      Additional info:

      Submariner was installed via ACM UI. Globalnet was also enabled via ACM UI while installing submariner.

      Custom Submariner subscription (with image: brew.registry.redhat.io/rh-osbs/iib:500162) was used for installation.

      Attached screenshot of ACM console in Attachments section with the error message

       

      Output of "subctl show all" from both managed clusters C1 and C2

      >> From Cluster C1

      $ subctl show all
      Cluster "sagrawal-nc1"
       ✓ Detecting broker(s)
       ✓ No brokers found
       ✓ Showing Connections
      GATEWAY     CLUSTER        REMOTE IP      NAT   CABLE DRIVER   SUBNETS        STATUS   RTT avg.
      compute-1   sagrawal-nc2   10.1.114.251   no    libreswan      242.0.0.0/16   error    0s
       ✓ Showing Endpoints
      CLUSTER        ENDPOINT IP    PUBLIC IP        CABLE DRIVER   TYPE
      sagrawal-nc1   10.1.114.212   66.187.232.130   libreswan      local
      sagrawal-nc2   10.1.114.251   66.187.232.130   libreswan      remote
      sagrawal-nc1   10.1.114.39    66.187.232.130   libreswan      local
      sagrawal-nc1   10.1.114.158   66.187.232.130   libreswan      local
       ✓ Showing Gateways
      NODE        HA STATUS   SUMMARY
      compute-0   active      0 connections out of 1 are established
      compute-1   passive     There are no connections
      compute-2   passive     There are no connections
       ✓ Showing Network details
          Discovered network details via Submariner:
              Network plugin:  OVNKubernetes
              Service CIDRs:   [172.30.0.0/16]
              Cluster CIDRs:   [10.128.0.0/14]
              Global CIDR:     242.1.0.0/16
       ✓ Showing versions
      COMPONENT                       REPOSITORY                  VERSION
      submariner-gateway              registry.redhat.io/rhacm2   077f9a0ea8d1d06ef7953a573729f4ecbe9b7602745af8d0fa0510b84ac44653
      submariner-routeagent           registry.redhat.io/rhacm2   3812063fbfd0e17f4d50ab428259d63b4b14bef23b4239d04dbc475efcccf350
      submariner-globalnet            registry.redhat.io/rhacm2   a3be3804eb897a54eed08c5e9560647fd8d46507e39a6d8f4bc1b8c7319b4839
      submariner-operator             registry.redhat.io/rhacm2   06b7339e46213720583e15e4aa8c84621d7b3d496ac865252a728e24bd6b5fde
      submariner-lighthouse-agent     registry.redhat.io/rhacm2   b6a5d7ebd58f92a6c526c7dc43046e137dca8c68079ad010a2739fffde1f5aca
      submariner-lighthouse-coredns   registry.redhat.io/rhacm2   10b3f94adcb2a6f2a354c897b4afbf6a828ae3793364bc1b5b9c6a4cdf2e44e5
      

       

      >> From Cluster C2

       

      [~] $ subctl show all
      Cluster "sagrawal-nc2"
       ✓ Detecting broker(s)
       ✓ No brokers found
       ✓ Showing Connections
      GATEWAY     CLUSTER        REMOTE IP      NAT   CABLE DRIVER   SUBNETS        STATUS      RTT avg.
      compute-0   sagrawal-nc1   10.1.114.212   no    libreswan      242.1.0.0/16   connected   570.137µs
       ✓ Showing Endpoints
      CLUSTER        ENDPOINT IP    PUBLIC IP        CABLE DRIVER   TYPE
      sagrawal-nc2   10.1.115.5     66.187.232.130   libreswan      local
      sagrawal-nc2   10.1.114.251   66.187.232.130   libreswan      local
      sagrawal-nc1   10.1.114.212   66.187.232.130   libreswan      remote
      sagrawal-nc2   10.1.115.3     66.187.232.130   libreswan      local
       ✓ Showing Gateways
      NODE        HA STATUS   SUMMARY
      compute-0   passive     There are no connections
      compute-1   active      All connections (1) are established
      compute-2   passive     There are no connections
       ✓ Showing Network details
          Discovered network details via Submariner:
              Network plugin:  OVNKubernetes
              Service CIDRs:   [172.30.0.0/16]
              Cluster CIDRs:   [10.128.0.0/14]
              Global CIDR:     242.0.0.0/16
       ✓ Showing versions
      COMPONENT                       REPOSITORY                  VERSION
      submariner-gateway              registry.redhat.io/rhacm2   077f9a0ea8d1d06ef7953a573729f4ecbe9b7602745af8d0fa0510b84ac44653
      submariner-routeagent           registry.redhat.io/rhacm2   3812063fbfd0e17f4d50ab428259d63b4b14bef23b4239d04dbc475efcccf350
      submariner-globalnet            registry.redhat.io/rhacm2   a3be3804eb897a54eed08c5e9560647fd8d46507e39a6d8f4bc1b8c7319b4839
      submariner-operator             registry.redhat.io/rhacm2   06b7339e46213720583e15e4aa8c84621d7b3d496ac865252a728e24bd6b5fde
      submariner-lighthouse-agent     registry.redhat.io/rhacm2   b6a5d7ebd58f92a6c526c7dc43046e137dca8c68079ad010a2739fffde1f5aca
      submariner-lighthouse-coredns   registry.redhat.io/rhacm2   10b3f94adcb2a6f2a354c897b4afbf6a828ae3793364bc1b5b9c6a4cdf2e44e5
      

       

              asuryana Aswin Suryanarayanan
              sagrawal@redhat.com Sidhant Agrawal
              Maxim Babushkin Maxim Babushkin
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: