-
Bug
-
Resolution: Done-Errata
-
Major
-
ACM 2.8.0, Submariner 0.15.0
-
False
-
None
-
False
-
-
-
Submariner Sprint 2023-7, Submariner Sprint 2023-8, Submariner Sprint 2023-9, Submariner Sprint 2023-10
-
No
Description of problem:
On a Regional DR setup (Submarnier with globalnet enabled),
while running one of the Regional DR failover test where all nodes of 1st managed cluster are powered off and later powered On, it was observed that the submariner connection goes into Degraded state and never recovers even though all nodes are back online & pods in running state.
Version-Release number of selected component (if applicable):
ACM: 2.8.0-DOWNSTREAM-2023-05-19-14-13-56
Submariner: 0.15.0 (image: brew.registry.redhat.io/rh-osbs/iib:500162)
ODF: 4.13.0-203
OCP: 4.13.0-0.nightly-2023-05-22-040653
How reproducible:
2/2 on Globalnet enabled cluster
Steps to Reproduce:
1. Configure RDR setup consisting of 3 OCP clusters (Hub, C1, C2)
2. Deploy multiple workload/applications (In my case, it was 5 applications with total 85 PVCs/Pods on C1)
3. Run IOs for ~10-20 minutes
4. Power off the C1 cluster node
5. Failover the workload/applications to C2
6. Power On the C1 cluster nodes
7. Check submariner status in ACM console
Actual results:
Submariner Connection status remains Degraded with below error message
The connection between clusters "sagrawal-nc1" and "sagrawal-nc2" is not established (status=error)
Expected results:
Connection status should recover once all cluster nodes are recovered
Additional info:
Submariner was installed via ACM UI. Globalnet was also enabled via ACM UI while installing submariner.
Custom Submariner subscription (with image: brew.registry.redhat.io/rh-osbs/iib:500162) was used for installation.
Attached screenshot of ACM console in Attachments section with the error message
Output of "subctl show all" from both managed clusters C1 and C2
>> From Cluster C1
$ subctl show all Cluster "sagrawal-nc1" ✓ Detecting broker(s) ✓ No brokers found ✓ Showing Connections GATEWAY CLUSTER REMOTE IP NAT CABLE DRIVER SUBNETS STATUS RTT avg. compute-1 sagrawal-nc2 10.1.114.251 no libreswan 242.0.0.0/16 error 0s ✓ Showing Endpoints CLUSTER ENDPOINT IP PUBLIC IP CABLE DRIVER TYPE sagrawal-nc1 10.1.114.212 66.187.232.130 libreswan local sagrawal-nc2 10.1.114.251 66.187.232.130 libreswan remote sagrawal-nc1 10.1.114.39 66.187.232.130 libreswan local sagrawal-nc1 10.1.114.158 66.187.232.130 libreswan local ✓ Showing Gateways NODE HA STATUS SUMMARY compute-0 active 0 connections out of 1 are established compute-1 passive There are no connections compute-2 passive There are no connections ✓ Showing Network details Discovered network details via Submariner: Network plugin: OVNKubernetes Service CIDRs: [172.30.0.0/16] Cluster CIDRs: [10.128.0.0/14] Global CIDR: 242.1.0.0/16 ✓ Showing versions COMPONENT REPOSITORY VERSION submariner-gateway registry.redhat.io/rhacm2 077f9a0ea8d1d06ef7953a573729f4ecbe9b7602745af8d0fa0510b84ac44653 submariner-routeagent registry.redhat.io/rhacm2 3812063fbfd0e17f4d50ab428259d63b4b14bef23b4239d04dbc475efcccf350 submariner-globalnet registry.redhat.io/rhacm2 a3be3804eb897a54eed08c5e9560647fd8d46507e39a6d8f4bc1b8c7319b4839 submariner-operator registry.redhat.io/rhacm2 06b7339e46213720583e15e4aa8c84621d7b3d496ac865252a728e24bd6b5fde submariner-lighthouse-agent registry.redhat.io/rhacm2 b6a5d7ebd58f92a6c526c7dc43046e137dca8c68079ad010a2739fffde1f5aca submariner-lighthouse-coredns registry.redhat.io/rhacm2 10b3f94adcb2a6f2a354c897b4afbf6a828ae3793364bc1b5b9c6a4cdf2e44e5
>> From Cluster C2
[~] $ subctl show all Cluster "sagrawal-nc2" ✓ Detecting broker(s) ✓ No brokers found ✓ Showing Connections GATEWAY CLUSTER REMOTE IP NAT CABLE DRIVER SUBNETS STATUS RTT avg. compute-0 sagrawal-nc1 10.1.114.212 no libreswan 242.1.0.0/16 connected 570.137µs ✓ Showing Endpoints CLUSTER ENDPOINT IP PUBLIC IP CABLE DRIVER TYPE sagrawal-nc2 10.1.115.5 66.187.232.130 libreswan local sagrawal-nc2 10.1.114.251 66.187.232.130 libreswan local sagrawal-nc1 10.1.114.212 66.187.232.130 libreswan remote sagrawal-nc2 10.1.115.3 66.187.232.130 libreswan local ✓ Showing Gateways NODE HA STATUS SUMMARY compute-0 passive There are no connections compute-1 active All connections (1) are established compute-2 passive There are no connections ✓ Showing Network details Discovered network details via Submariner: Network plugin: OVNKubernetes Service CIDRs: [172.30.0.0/16] Cluster CIDRs: [10.128.0.0/14] Global CIDR: 242.0.0.0/16 ✓ Showing versions COMPONENT REPOSITORY VERSION submariner-gateway registry.redhat.io/rhacm2 077f9a0ea8d1d06ef7953a573729f4ecbe9b7602745af8d0fa0510b84ac44653 submariner-routeagent registry.redhat.io/rhacm2 3812063fbfd0e17f4d50ab428259d63b4b14bef23b4239d04dbc475efcccf350 submariner-globalnet registry.redhat.io/rhacm2 a3be3804eb897a54eed08c5e9560647fd8d46507e39a6d8f4bc1b8c7319b4839 submariner-operator registry.redhat.io/rhacm2 06b7339e46213720583e15e4aa8c84621d7b3d496ac865252a728e24bd6b5fde submariner-lighthouse-agent registry.redhat.io/rhacm2 b6a5d7ebd58f92a6c526c7dc43046e137dca8c68079ad010a2739fffde1f5aca submariner-lighthouse-coredns registry.redhat.io/rhacm2 10b3f94adcb2a6f2a354c897b4afbf6a828ae3793364bc1b5b9c6a4cdf2e44e5
- is cloned by
-
ACM-6012 Submariner connection doesn't recover after all cluster nodes are powered off and powered on
- Closed