-
Feature Request
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
Product / Portfolio Work
-
None
-
False
-
-
None
-
None
-
None
-
-
None
-
None
-
None
-
None
-
None
Problem:
In clusters running for a long time (e.g., over 1 year), the console-operator pod do not automatically reload rotated internal certificates, and it triggers GarbageCollectorSyncFailed in the kube-controller-manager, which stops the cleanup of Completed pods.
Currently, multiple KCS articles address similar issues, but they only provide manual workarounds (like restarting pods). This indicates a recurring product issue that needs a permanent fix.
kube-controller-manager CO got DEGRADED status and occur GarbageCollectorSyncFailed alert due to x509: certificate has expired or is not yet valid for console.openshift.io/v1alpha1 webhook - Red Hat Customer Portal
ClusterMonitoringOperator in Degraded State Due to Expired TLS Certificate in Console Plugin Webhook in RHOCP 4 - Red Hat Customer Portal
Expired webhook certificate for openshift-console-operator resulting in GarbageCollectorDegraded state in RHOCP 4 - Red Hat Customer Portal
While regular cluster upgrades would recreate pods and avoid certificate expiration issues, mission-critical systems such as banking system cannot always be updated frequently. Additionally, with the extension of OpenShift support lifecycles—such as the EUS3 providing up to 4 years of support—it is expected that a cluster may run for a very long period without node reboots or major updates.
Customer Impact:
- CronJobs ignore successfulJobsHistoryLimit, leading to thousands of orphan Pods.
- Administrators must manually identify and restart pods every time certificates rotate.
Requested Enhancement:
For example, we need a dynamic certificate reloading mechanism for all OpenShift-native pods. Specifically, a mechanism to ensure the Garbage Collector sync process can recover from certificate rotation issue without manual pod restarts.