Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.14.z
Component/s: Management Console
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:

4.21.0
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

service-ca rotates certificates every 26 months, but EUS term-2 clusters run 3+ years without upgrades. The cluster-monitoring-operator wasn't watching service-CA generated secrets, so when certificates rotated, pods continued using expired certificates until manually restarted.

Services with the service.beta.openshift.io/serving-cert-secret-name annotation automatically get TLS certificates from service-CA. When service-CA rotates these certificates, it updates the secret content, but the consuming pods don't automatically restart to pick up the new certificate from the filesystem.
The monitoring operator needs to detect these secret changes and trigger deployment updates to restart the pods.

console-operator's conversion webhook depends on monitoring-plugin-cert. When this cert expired, TLS validation failed, breaking the monitoring operator's ability to reconcile console plugins. The certificate was rotated in the secret, but the pod was still using the old certificate from its mounted filesystem.

Version-Release number of selected component (if applicable):

4.14.z (but should be reproducible in later releases).

How reproducible:

Hard since the service-CA have 2 years expiry, unless we can manually reduce the service-CA expiry for quick reproduction.

Steps to Reproduce:

    1. Install a OCP cluster. 
    2. Once service-CA rotates the cert internally (13 months later after provision), do not restart the cluster until the old service-CA expired.
    3. Monitor the monitoring CO status

Actual results:

When the service-CA expired naturally, we can see monitoring CO degraded:

monitoring 4.14.13 False True True 2m28s reconciling Console Plugin failed: retrieving ConsolePlugin object failed: conversion webhook for console.openshift.io/v1alpha1, Kind=ConsolePlugin failed: Post "https://webhook.openshift-console-operator.svc:9443/crdconvert?timeout=30s": tls: failed to verify certificate: x509: certificate has expired or is not yet valid: current time 2025-09-21T18:36:49Z is after 2025-09-18T13:38:07Z

Expected results:

The cluster-monitoring-operator should reconcile with the console-operator to load the new service-CA cert automatically without human intervention. Whole process should be transparent to end users as per https://access.redhat.com/solutions/7075458

Additional info:

simply restarting the console-operator pod in openshift-console-operator ns will resolve this issue.

links to

openshift/cluster-monitoring-operator#2726: OCPBUGS-63502: Fix service-CA certificate rotation not triggering pod restarts

Assignee:: Jakub Hadvig

Reporter:: Bin Hu

Need Info From:: None

Contributors:: None

QA Contact:: Junqi Zhao

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2025/10/24 7:44 AM

Updated:: 2025/10/28 8:21 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates