-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
ACM 2.10.2
-
False
-
None
-
False
-
-
-
Important
-
No
Description of problem:
During failover of a subscription based workload (e.g from cluster c1), ramen restores workload PVC on the failover cluster (e.g. c2). When the PVC is ready, ramen changes the PlacmentDecision status to the failover cluster (e.g. c2). At this point ACM should deploy the subscription on the managed cluster (e.g. c2).
When this rare bug is reproduced, this never happens, A manifestwork for the subscription is not created and no progress happen for days.
If we scale down and up again the multicluster-operators-hub-subscription deployment, the manifestwork is created after few minutes and the application is deployed on the managed cluster.
Version-Release number of selected component (if applicable):
% oc get csv -n open-cluster-management NAME DISPLAY VERSION REPLACES PHASE advanced-cluster-management.v2.10.3 Advanced Cluster Management for Kubernetes 2.10.3 advanced-cluster-management.v2.10.2 Succeeded odf-multicluster-orchestrator.v4.16.0-100.stable ODF Multicluster Orchestrator 4.16.0-100.stable odf-multicluster-orchestrator.v4.16.0-86.stable Succeeded odr-hub-operator.v4.16.0-100.stable Openshift DR Hub Operator 4.16.0-100.stable odr-hub-operator.v4.16.0-86.stable Succeeded openshift-gitops-operator.v1.12.1 Red Hat OpenShift GitOps 1.12.1 openshift-gitops-operator.v1.12.0 Succeeded
How reproducible:
random, happened about 2 times in last year.
Steps to Reproduce:
We don't know how to reproduce this.
Actual results:
manifestwork is not created and workload not deployed on failover cluster
% oc get manifestwork -n c01-mdr-c2 | grep vm16-datavol-sub-02 vm16-datavol-sub-02-placement-1-drpc-vm16-datavol-sub-02-ns-mw 8d vm16-datavol-sub-02-placement-1-drpc-vm16-datavol-sub-02-vrg-mw 8d
Expected results:
manifestwork is create and workload deployed on failover cluster
% oc get manifestwork -n c01-mdr-c2 | grep vm16-datavol-sub-02 vm16-datavol-sub-02-placement-1-drpc-vm16-datavol-sub-02-ns-mw 8d vm16-datavol-sub-02-placement-1-drpc-vm16-datavol-sub-02-vrg-mw 8d vm16-datavol-sub-02-vm16-datavol-sub-02-subscription-1 77m
Additional info:
To fix the issue we scaled down and up the hub-subscription operator:
% oc scale deployment -n open-cluster-management multicluster-operators-hub-subscription --replicas=0 % oc scale deployment -n open-cluster-management multicluster-operators-hub-subscription --replicas=1
Attached files:
- gather.acm.tar.xz - kubectl gather of all acm namespaces (open-cluster-management* and manged clusters namespaces in all clusters when the system was broken
- gather.acm.fixed.tar.gz - same after scaling down/up the hub-subscription operator
Links:
- ODF bug: https://bugzilla.redhat.com/2291343