-
Bug
-
Resolution: Done
-
Blocker
-
MCE 2.0.7
-
False
-
None
-
False
-
-
-
No
Description of problem:
When one zone went down(3 ceph nodes, c1, and active hub down), restoring the data in passive hub didn't go well as mentioned in bz2172202#c0.
Becasue of that issue, was not able to failover application from c1 to c2.
After following the WA given in bz2172202#c11, was able to failover application from c1 to c2 and verified the applications running in c2.
Later c1 was bought up, unfenced and gracefully rebooted.
In ACM console c1 is in Ready state, all drpc updated with the right status, s3profilesecrets of c1 is added to Ramen config.
But the cleanup is not happening in c1 cluster that is application are still running in c1 cluster.
For e.g `helloworld-c1` application running from both managed clusters and accessible that is helloworld-app route from each cluster opens `hello world` page
new-hub] $ date; date --utc; oc get drpc -n helloworld-c1 -owide
Wed Feb 22 17:00:46 IST 2023
Wed Feb 22 11:30:46 UTC 2023
NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY
helloworld-c1-placement-1-drpc 20h akrai-f20-c1 akrai-f20-c2 Failover FailedOver Cleaning Up 2023-02-21T18:28:27Z False
c1] $ date; date --utc; oc get all -n helloworld-c1
Wed Feb 22 16:59:09 IST 2023
Wed Feb 22 11:29:09 UTC 2023
NAME READY STATUS RESTARTS AGE
pod/helloworld-app-deploy-69c5d9dbbc-2jr9b 1/1 Running 0 3h21m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/helloworld-app-svc NodePort 172.30.180.245 <none> 3002:30261/TCP 25h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/helloworld-app-deploy 1/1 1 1 25h
NAME DESIRED CURRENT READY AGE
replicaset.apps/helloworld-app-deploy-69c5d9dbbc 1 1 1 25h
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
route.route.openshift.io/helloworld-app-route helloworld-app-route-helloworld-c1.apps.akrai-f20-c1.qe.rh-ocs.com helloworld-app-svc 3002 None
c2] $ date; date --utc; oc get all -n helloworld-c1
Wed Feb 22 16:59:11 IST 2023
Wed Feb 22 11:29:11 UTC 2023
NAME READY STATUS RESTARTS AGE
pod/helloworld-app-deploy-69c5d9dbbc-98h66 1/1 Running 0 16h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/helloworld-app-svc NodePort 172.30.245.230 <none> 3002:30647/TCP 16h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/helloworld-app-deploy 1/1 1 1 16h
NAME DESIRED CURRENT READY AGE
replicaset.apps/helloworld-app-deploy-69c5d9dbbc 1 1 1 16h
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
route.route.openshift.io/helloworld-app-route helloworld-app-route-helloworld-c1.apps.akrai-f20-c2.qe.rh-ocs.com helloworld-app-svc 3002 None
Version-Release number of selected component (if applicable):
Version of all relevant components (if applicable):
OCP: 4.12.0-0.nightly-2023-02-20-054425
ODF: 4.12.1
CEPH: 16.2.10-137.el8cp (c753293698537775a9c24abea01a9826659e6b17) pacific (stable)
ACM: 2.7 (gaed)
How reproducible:
Always
Steps to Reproduce:
1. Create 4 OCP cluster such that 2 hubs and 2 managed clusters. And one stretched RHCS cluster.
Deploy cluster in such way that
zone a: arbiter ceph node
zone b: c1, active hub, 3 ceph nodes
zone c: c2, passive hub, 3 ceph nodes
2. Configure MDR and deploy application on each managed cluster
3. Initiate a backup process, such that active and passive hub are in sync
4. Made zone b down
5. Initiate restore process in passive hub
6. Apply the WA suggested here bz2172202#c11, initiate failover of applications from c1 to c2
7. Verify all applications are running in c2
8. Bring c1 managed cluster up
9. Create a new auto-import-secret of c1 in new hub.
10. Unfenced c1 and gracefully rebooted.
- ...