Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-3920

Cleaning Up of applications in c1 is hanged after the hub recovery procedure

XMLWordPrintable

    • False
    • None
    • False
    • No

      Description of problem:

      When one zone went down(3 ceph nodes, c1, and active hub down), restoring the data in passive hub didn't go well as mentioned in bz2172202#c0.
      Becasue of that issue, was not able to failover application from c1 to c2.

      After following the WA given in bz2172202#c11, was able to failover application from c1 to c2 and verified the applications running in c2.

      Later c1 was bought up, unfenced and gracefully rebooted.
      In ACM console c1 is in Ready state, all drpc updated with the right status, s3profilesecrets of c1 is added to Ramen config.
      But the cleanup is not happening in c1 cluster that is application are still running in c1 cluster.

      For e.g `helloworld-c1` application running from both managed clusters and accessible that is helloworld-app route from each cluster opens `hello world` page

      new-hub] $ date; date --utc; oc get drpc -n helloworld-c1 -owide
      Wed Feb 22 17:00:46 IST 2023
      Wed Feb 22 11:30:46 UTC 2023
      NAME AGE PREFERREDCLUSTER FAILOVERCLUSTER DESIREDSTATE CURRENTSTATE PROGRESSION START TIME DURATION PEER READY
      helloworld-c1-placement-1-drpc 20h akrai-f20-c1 akrai-f20-c2 Failover FailedOver Cleaning Up 2023-02-21T18:28:27Z False

      c1] $ date; date --utc; oc get all -n helloworld-c1
      Wed Feb 22 16:59:09 IST 2023
      Wed Feb 22 11:29:09 UTC 2023
      NAME READY STATUS RESTARTS AGE
      pod/helloworld-app-deploy-69c5d9dbbc-2jr9b 1/1 Running 0 3h21m

      NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
      service/helloworld-app-svc NodePort 172.30.180.245 <none> 3002:30261/TCP 25h

      NAME READY UP-TO-DATE AVAILABLE AGE
      deployment.apps/helloworld-app-deploy 1/1 1 1 25h

      NAME DESIRED CURRENT READY AGE
      replicaset.apps/helloworld-app-deploy-69c5d9dbbc 1 1 1 25h

      NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
      route.route.openshift.io/helloworld-app-route helloworld-app-route-helloworld-c1.apps.akrai-f20-c1.qe.rh-ocs.com helloworld-app-svc 3002 None

      c2] $ date; date --utc; oc get all -n helloworld-c1
      Wed Feb 22 16:59:11 IST 2023
      Wed Feb 22 11:29:11 UTC 2023
      NAME READY STATUS RESTARTS AGE
      pod/helloworld-app-deploy-69c5d9dbbc-98h66 1/1 Running 0 16h

      NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
      service/helloworld-app-svc NodePort 172.30.245.230 <none> 3002:30647/TCP 16h

      NAME READY UP-TO-DATE AVAILABLE AGE
      deployment.apps/helloworld-app-deploy 1/1 1 1 16h

      NAME DESIRED CURRENT READY AGE
      replicaset.apps/helloworld-app-deploy-69c5d9dbbc 1 1 1 16h

      NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
      route.route.openshift.io/helloworld-app-route helloworld-app-route-helloworld-c1.apps.akrai-f20-c2.qe.rh-ocs.com helloworld-app-svc 3002 None

      Version-Release number of selected component (if applicable):

      Version of all relevant components (if applicable):
      OCP: 4.12.0-0.nightly-2023-02-20-054425
      ODF: 4.12.1
      CEPH: 16.2.10-137.el8cp (c753293698537775a9c24abea01a9826659e6b17) pacific (stable)
      ACM: 2.7 (gaed)

      How reproducible:

      Always

      Steps to Reproduce:

      1. Create 4 OCP cluster such that 2 hubs and 2 managed clusters. And one stretched RHCS cluster.
      Deploy cluster in such way that
      zone a: arbiter ceph node
      zone b: c1, active hub, 3 ceph nodes
      zone c: c2, passive hub, 3 ceph nodes
      2. Configure MDR and deploy application on each managed cluster
      3. Initiate a backup process, such that active and passive hub are in sync
      4. Made zone b down
      5. Initiate restore process in passive hub
      6. Apply the WA suggested here bz2172202#c11, initiate failover of applications from c1 to c2
      7. Verify all applications are running in c2
      8. Bring c1 managed cluster up
      9. Create a new auto-import-secret of c1 in new hub.
      10. Unfenced c1 and gracefully rebooted.

      1. ...

              wliu1 Wei Liu
              bmekhiss Benamar Mekhissi
              Thuy Nguyen Thuy Nguyen
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: