-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
Quality / Stability / Reliability
-
False
-
-
False
-
-
-
GRC Sprint 2025-20, GRC Sprint 2025-19
-
Important
-
Customer Escalated, Customer Reported
-
None
Description of problem:
During a control-plane node replacement procedure on an ACM Managed Cluster the Governance Policy Addons begin to fail due to connection issues with the Managed Cluster API Server and are unable to recover automatically even once the API Server is restored. This results in Policies running on that Managed Cluster to display no status on the ACM Hub and policy reconciliation does not progress. This does not recover until the governance addon pods are restarted on the affected Managed Cluster.
Version-Release number of selected component (if applicable):
2.13
How reproducible:
Reproducible
Steps to Reproduce:
- The policies are all reconciled and compliant on the managed cluster
- One control node is removed from the managed cluster, two control nodes remain and the cluster is in an operational state
- API connection errors begin to appear in the policy controller pod logs on the managed cluster
- Policy reconciliation becomes stuck on the managed cluster and do not reconcile
- The policy remains unreconciled even after the control node is recovered and re-added to the managed cluster
- Until the policy controller pod is restarted on the managed cluster, policy reconciliation does not progress
Actual results:
ACM Hub reports no policy status for the policies running on the Managed Cluster, policy reconciliation does not progress.
Expected results:
The policy addon is able to recover.