-
Bug
-
Resolution: Cannot Reproduce
-
Undefined
-
None
-
ACM 2.13.2
-
Quality / Stability / Reliability
-
False
-
-
False
-
-
-
Important
-
None
Description of problem:
On a Regional DR setup, there are 4 clusters imported to ACM names amagrawa-419-2/3/4/5 where clusters 2-3 are tied to clusterset myclusterset-1 and clusters 4-5 are tied to myclusterset-2 and connected via submariner. Then after DR configuration, 2 RBD workloads (1 appset and 1 subscription) are deployed on cluster2 and 2 CephFS workloads (1 appset and 1 subscription) are deployed on cluster4 which makes them primary cluster for the workload.
Using DR, data is then replicated to it's peer cluster i.e. data from cluster2 syncs to cluster3 and data from cluster4 syncs to cluster5.
After running IOs for 2 days without any DR operation, we see that data sync for workload in NS busybox-workloads-4 running on cluster amagrawa-419-4 gets impacted for 1 of the 4 PVCs in that NS.
Please note that these clusters aren't heavily loaded with IOs and it's not an infrastructure issue.
bmekhiss helped us bring this issue to Submariner team and was discussed here https://redhat-internal.slack.com/archives/C0134E73VH6/p1745591703720299?thread_ts=1743492550.465549&cid=C0134E73VH6 with rh-ee-vthapar and tpanteli
The issue didn't recover on it's own so after discussion and their recommendation, `lighthouse-agent` was restarted which fixed this issue. Follow thread for more details.
Logs from the import cluster: https://drive.google.com/drive/folders/1foqGvKHSTN7fbiZYL2dicsINPvAOebvP?usp=sharing
Logs from the export cluster: https://drive.google.com/drive/folders/1qcX6w2UHVO5RMKmzdUxdUFVRa5-oFWTM?usp=sharing
Version-Release number of selected component (if applicable):
ODF 4.19.0-46.konflux
OCP 4.19.0-0.nightly-2025-04-17-154552
GitOps 1.16.0
Submariner 0.20.0 GA'ed
ACM 2.13.2 GA'ed
How reproducible: We have hit this issue multiple times and few of these occurrences were discussed in the same slack thread
Steps to Reproduce:
- On a RDR setup, import 4 clusters to ACM and configure 2 peer DR relationships between them using 2 clustersets and submariner. Run IOs for a few days and monitor data sync.
- ...