-
Bug
-
Resolution: Done-Errata
-
Major
-
ACM 2.9.0
-
False
-
None
-
False
-
-
-
Submariner Sprint 2023-11, Submariner Sprint 2023-12, Submariner Sprint 2023-13
-
Moderate
-
No
Description of problem:
On RDR longevity setup, on one of the managed clusters C2, Globalnet IP gets reallocated due to the race condition on GlobalNet and HA enabled environment. This causes the lighthouse component to return wrong IP for a rook ns lookup query, which in turn stopped the mirroring from C2 to C1.
Version-Release number of selected component (if applicable):
ODF- 4.1.4.0-128
OCP - 4.14.0-0.nightly-2023-09-12-024050
Submariner - 0.16 (brew.registry.redhat.io/rh-osbs/iib:569163)
ACM - v2.9.0-109 (2.9.0-DOWNSTREAM-2023-08-24-09-30-12)
ceph version 17.2.6-120.el9cp (6fb9bb1d83813766a53a421c7bc80f7835bcaf6c) quincy (stable)
How reproducible:
Steps to Reproduce:
- On Regional DR longevity setup which has been running for more 2 weeks perform failover(from C1 to C2) and relocate back(C2 to C1) of an app, operation was successful
- Keep the cluster running a day, mirrioring is lost from C2 to C1
Actual results:
On one of the managed clusters C2, Globalnet IP gets reallocated due to the race condition on GlobalNet and HA enabled environment. There was a race condition in the Globalnet controller code which will be seen only during GW migration. This causes the lighthouse component to return wrong IP for a rook ns lookup query, which in turn stopped the mirroring from C2 to C1.
Expected results:
Globalnet IPs should not be updated/reallocated.
Subctl gather logs
Additional info:
Slack discussion of RCA
https://redhat-internal.slack.com/archives/C0134E73VH6/p1696321927050439
https://redhat-internal.slack.com/archives/C0134E73VH6/p1696348571144659
https://redhat-internal.slack.com/archives/C0134E73VH6/p1696495985131449
- links to
-
RHEA-2023:123669 RHEA: Submariner 0.16.2 - bug fix and enhancement update