Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-12228

Subscription is not redeployed on the managed cluster

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • ACM 2.10.2
    • Application Lifecycle
    • False
    • None
    • False
    • Important
    • No

      Description of problem:

      During failover of a subscription based workload (e.g from cluster c1), ramen restores workload PVC on the failover cluster (e.g. c2). When the PVC is ready, ramen changes the PlacmentDecision status to the failover cluster (e.g. c2). At this point ACM should deploy the subscription on the managed cluster (e.g.  c2).

      When this rare bug is reproduced, this never happens, A manifestwork for the subscription is not created and no progress happen for days.

      If we scale down and up again the multicluster-operators-hub-subscription deployment, the manifestwork is created after few minutes and the application is deployed on the managed cluster.

      Version-Release number of selected component (if applicable):

      % oc get csv -n open-cluster-management 
      NAME                                               DISPLAY                                      VERSION             REPLACES                                          PHASE
      advanced-cluster-management.v2.10.3                Advanced Cluster Management for Kubernetes   2.10.3              advanced-cluster-management.v2.10.2               Succeeded
      odf-multicluster-orchestrator.v4.16.0-100.stable   ODF Multicluster Orchestrator                4.16.0-100.stable   odf-multicluster-orchestrator.v4.16.0-86.stable   Succeeded
      odr-hub-operator.v4.16.0-100.stable                Openshift DR Hub Operator                    4.16.0-100.stable   odr-hub-operator.v4.16.0-86.stable                Succeeded
      openshift-gitops-operator.v1.12.1                  Red Hat OpenShift GitOps                     1.12.1              openshift-gitops-operator.v1.12.0                 Succeeded

      How reproducible:

      random, happened about 2 times in last year.

      Steps to Reproduce:

      We don't know how to reproduce this.

      Actual results:

      manifestwork is not created and workload not deployed on failover cluster

      % oc get manifestwork -n c01-mdr-c2 | grep vm16-datavol-sub-02
      vm16-datavol-sub-02-placement-1-drpc-vm16-datavol-sub-02-ns-mw    8d
      vm16-datavol-sub-02-placement-1-drpc-vm16-datavol-sub-02-vrg-mw   8d

      Expected results:

      manifestwork is create and workload deployed on failover cluster

      % oc get manifestwork -n c01-mdr-c2 | grep vm16-datavol-sub-02
      vm16-datavol-sub-02-placement-1-drpc-vm16-datavol-sub-02-ns-mw    8d
      vm16-datavol-sub-02-placement-1-drpc-vm16-datavol-sub-02-vrg-mw   8d
      vm16-datavol-sub-02-vm16-datavol-sub-02-subscription-1            77m

      Additional info:

      To fix the issue we scaled down and up the hub-subscription operator:

      % oc scale deployment -n open-cluster-management multicluster-operators-hub-subscription --replicas=0
      % oc scale deployment -n open-cluster-management multicluster-operators-hub-subscription --replicas=1

      Attached files:

      • gather.acm.tar.xz - kubectl gather of all acm namespaces (open-cluster-management* and manged clusters namespaces in all clusters when the system was broken
      • gather.acm.fixed.tar.gz - same after scaling down/up the hub-subscription operator

      Links:

       

        1. gather.acm.fixed.tar.xz
          14.29 MB
          Nir Soffer
        2. gather.acm.tar.xz
          13.32 MB
          Nir Soffer

            xiangli@redhat.com Xiangjing Li
            nsoffer@redhat.com Nir Soffer
            David Huynh David Huynh
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: