-
Bug
-
Resolution: Done
-
Undefined
-
ACM 2.10.4
-
2
-
False
-
None
-
False
-
-
-
-
SF Train-22, SF Train-23
-
None
Description of the problem:
ManagedCluster status is switching from True to Unknown very frequently. Several times per second.
I have a video with the demo (here)
It happens on a big hub with hundreds of ManagedClusters. Some of these are flapping the status, and this making the ocm-controller to go on a crazy reconcile loop effort that never ends.
(pic here)
Notice the timestamps and how often it reconciles.
Some of these managedclusters are old, some abandoned, or doing wrong things. But this should not affect ACM Hub.
Pods like the ocm-controller and clusterlifecycle-state-metrics are supper affected. Reconciling so frequently are making these to reach other different bug, the pod ends-up crashing:
(pic here)
but first, why ManagedCluster change their status so frequently is our main issue.
How reproducible:
not sure how to reproduce it. ManagedClusters could be malfunctioning, but it should not affect ACM.
Steps to reproduce:
1.
2.
3.
Actual results:
Expected results:
ACM detects wrong behaviour on ManagedClusters and dont try to reconcile so often.
- is related to
-
ACM-15634 ACM managed cluster resources remains when deleting cluster
-
- In Progress
-
- links to
-
RHSA-2024:138990 Red Hat Advanced Cluster Management 2.10.7 bug fixes and container updates