Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-63502

openshift-console-operator pod should honor the service-CA rotation

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.14.z
    • Management Console
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      service-ca rotates certificates every 26 months, but EUS term-2 clusters run 3+ years without upgrades. The cluster-monitoring-operator wasn't watching service-CA generated secrets, so when certificates rotated, pods continued using expired certificates until manually restarted.
      
      Services with the service.beta.openshift.io/serving-cert-secret-name annotation automatically get TLS certificates from service-CA. When service-CA rotates these certificates, it updates the secret content, but the consuming pods don't automatically restart to pick up the new certificate from the filesystem.
      The monitoring operator needs to detect these secret changes and trigger deployment updates to restart the pods.
      
      console-operator's conversion webhook depends on monitoring-plugin-cert. When this cert expired, TLS validation failed, breaking the monitoring operator's ability to reconcile console plugins. The certificate was rotated in the secret, but the pod was still using the old certificate from its mounted filesystem. 

      Version-Release number of selected component (if applicable):

      4.14.z (but should be reproducible in later releases).

      How reproducible:

      Hard since the service-CA have 2 years expiry, unless we can manually reduce the service-CA expiry for quick reproduction.

      Steps to Reproduce:

          1. Install a OCP cluster. 
          2. Once service-CA rotates the cert internally (13 months later after provision), do not restart the cluster until the old service-CA expired.
          3. Monitor the monitoring CO status

      Actual results:

      When the service-CA expired naturally, we can see monitoring CO degraded:
      
      monitoring 4.14.13 False True True 2m28s reconciling Console Plugin failed: retrieving ConsolePlugin object failed: conversion webhook for console.openshift.io/v1alpha1, Kind=ConsolePlugin failed: Post "https://webhook.openshift-console-operator.svc:9443/crdconvert?timeout=30s": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2025-09-21T18:36:49Z is after 2025-09-18T13:38:07Z       

      Expected results:

      The cluster-monitoring-operator should reconcile with the console-operator to load the new service-CA cert automatically without human intervention. Whole process should be transparent to end users as per https://access.redhat.com/solutions/7075458

      Additional info:

      simply restarting the console-operator pod in openshift-console-operator ns will resolve this issue.

              jhadvig@redhat.com Jakub Hadvig
              rhn-support-bihu Bin Hu
              None
              None
              Junqi Zhao Junqi Zhao
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: