Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-16330

ManagedCluster status flapping

XMLWordPrintable

    • 2
    • False
    • None
    • False
    • SF Train-22, SF Train-23
    • None

      Description of the problem:

      ManagedCluster status is switching from True to Unknown very frequently. Several times per second.

      I have a video with the  demo (here)

      It happens on a big hub with hundreds of ManagedClusters. Some of these are flapping the status, and this making the ocm-controller to go on a crazy reconcile loop effort that never ends.

      (pic here)

      Notice the timestamps and how often it reconciles.

      Some of these managedclusters are old, some abandoned, or doing wrong things. But this should not affect ACM Hub.

      Pods like the ocm-controller and clusterlifecycle-state-metrics are supper affected. Reconciling so frequently are making these to reach other different bug, the pod ends-up crashing:

      (pic here)

      but first, why ManagedCluster change their status so frequently is our main issue.

      How reproducible:

      not sure how to reproduce it. ManagedClusters could be malfunctioning, but it should not affect ACM.

       

      Steps to reproduce:

      1.

      2.

      3.

      Actual results:

       

      Expected results:

      ACM detects wrong behaviour on ManagedClusters and dont try to reconcile so often.

       

              zxue@redhat.com ZHAO XUE
              jgato@redhat.com Jose Gato Luis
              Jose Gato Luis
              Hui Chen Hui Chen
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: