Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Blocker
Fix Version/s: MCE 2.2.3
Affects Version/s: MCE 2.0.7
Component/s: Server Foundation
Labels:
- stop_ship

Blocked:
False
Blocked Reason:
None
Ready:
False
Intelligence Requested:
Market:

Regression:
No

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Description of problem:

When one zone went down(3 ceph nodes, c1, and active hub down), restoring the data in passive hub didn't go well as mentioned in bz2172202#c0.
Becasue of that issue, was not able to failover application from c1 to c2.

After following the WA given in bz2172202#c11, was able to failover application from c1 to c2 and verified the applications running in c2.

Later c1 was bought up, unfenced and gracefully rebooted.
In ACM console c1 is in Ready state, all drpc updated with the right status, s3profilesecrets of c1 is added to Ramen config.
But the cleanup is not happening in c1 cluster that is application are still running in c1 cluster.

For e.g `helloworld-c1` application running from both managed clusters and accessible that is helloworld-app route from each cluster opens `hello world` page

new-hub] $ date; date --utc; oc get drpc -n helloworld-c1 -owide
Wed Feb 22 17:00:46 IST 2023
Wed Feb 22 11:30:46 UTC 2023
NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY
helloworld-c1-placement-1-drpc 20h akrai-f20-c1 akrai-f20-c2 Failover FailedOver Cleaning Up 2023-02-21T18:28:27Z False

c1] $ date; date --utc; oc get all -n helloworld-c1
Wed Feb 22 16:59:09 IST 2023
Wed Feb 22 11:29:09 UTC 2023
NAME READY STATUS RESTARTS AGE
pod/helloworld-app-deploy-69c5d9dbbc-2jr9b 1/1 Running 0 3h21m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/helloworld-app-svc NodePort 172.30.180.245 <none> 3002:30261/TCP 25h

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/helloworld-app-deploy 1/1 1 1 25h

NAME DESIRED CURRENT READY AGE
replicaset.apps/helloworld-app-deploy-69c5d9dbbc 1 1 1 25h

NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
route.route.openshift.io/helloworld-app-route helloworld-app-route-helloworld-c1.apps.akrai-f20-c1.qe.rh-ocs.com helloworld-app-svc 3002 None

c2] $ date; date --utc; oc get all -n helloworld-c1
Wed Feb 22 16:59:11 IST 2023
Wed Feb 22 11:29:11 UTC 2023
NAME READY STATUS RESTARTS AGE
pod/helloworld-app-deploy-69c5d9dbbc-98h66 1/1 Running 0 16h

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/helloworld-app-svc NodePort 172.30.245.230 <none> 3002:30647/TCP 16h

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/helloworld-app-deploy 1/1 1 1 16h

NAME DESIRED CURRENT READY AGE
replicaset.apps/helloworld-app-deploy-69c5d9dbbc 1 1 1 16h

NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
route.route.openshift.io/helloworld-app-route helloworld-app-route-helloworld-c1.apps.akrai-f20-c2.qe.rh-ocs.com helloworld-app-svc 3002 None

Version-Release number of selected component (if applicable):

Version of all relevant components (if applicable):
OCP: 4.12.0-0.nightly-2023-02-20-054425
ODF: 4.12.1
CEPH: 16.2.10-137.el8cp (c753293698537775a9c24abea01a9826659e6b17) pacific (stable)
ACM: 2.7 (gaed)

How reproducible:

Always

Steps to Reproduce:

1. Create 4 OCP cluster such that 2 hubs and 2 managed clusters. And one stretched RHCS cluster.
Deploy cluster in such way that
zone a: arbiter ceph node
zone b: c1, active hub, 3 ceph nodes
zone c: c2, passive hub, 3 ceph nodes
2. Configure MDR and deploy application on each managed cluster
3. Initiate a backup process, such that active and passive hub are in sync
4. Made zone b down
5. Initiate restore process in passive hub
6. Apply the WA suggested here bz2172202#c11, initiate failover of applications from c1 to c2
7. Verify all applications are running in c2
8. Bring c1 managed cluster up
9. Create a new auto-import-secret of c1 in new hub.
10. Unfenced c1 and gracefully rebooted.

Assignee:: Wei Liu

Reporter:: Benamar Mekhissi

QA Contact:: Thuy Nguyen

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2023/02/27 10:16 PM

Updated:: 2023/12/18 6:36 AM

Resolved:: 2023/12/18 6:36 AM

Details

Description

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Attachments

Easy Agile Planning Poker

Activity

People

Dates