Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-8705

automatic certificate reloading for control plane components to prevent Garbage Collector failure in long-running clusters

XMLWordPrintable

    • None
    • Product / Portfolio Work
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Problem:
      In clusters running for a long time (e.g., over 1 year), the console-operator pod do not automatically reload rotated internal certificates, and it triggers GarbageCollectorSyncFailed in the kube-controller-manager, which stops the cleanup of Completed pods.
      Currently, multiple KCS articles address similar issues, but they only provide manual workarounds (like restarting pods). This indicates a recurring product issue that needs a permanent fix.

      kube-controller-manager CO got DEGRADED status and occur GarbageCollectorSyncFailed alert due to x509: certificate has expired or is not yet valid for console.openshift.io/v1alpha1 webhook - Red Hat Customer Portal
      ClusterMonitoringOperator in Degraded State Due to Expired TLS Certificate in Console Plugin Webhook in RHOCP 4 - Red Hat Customer Portal
      Expired webhook certificate for openshift-console-operator resulting in GarbageCollectorDegraded state in RHOCP 4 - Red Hat Customer Portal

      While regular cluster upgrades would recreate pods and avoid certificate expiration issues, mission-critical systems such as banking system cannot always be updated frequently. Additionally, with the extension of OpenShift support lifecycles—such as the EUS3 providing up to 4 years of support—it is expected that a cluster may run for a very long period without node reboots or major updates.

      Customer Impact:

      • CronJobs ignore successfulJobsHistoryLimit, leading to thousands of orphan Pods.
      • Administrators must manually identify and restart pods every time certificates rotate.

      Requested Enhancement:
      For example, we need a dynamic certificate reloading mechanism for all OpenShift-native pods. Specifically, a mechanism to ensure the Garbage Collector sync process can recover from certificate rotation issue without manual pod restarts.

              racedoro@redhat.com Ramon Acedo
              rhn-support-yuokada Yuki Okada
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                None
                None