Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: Logging
Labels:
- vector

Target Version:
None
Activity Type:
Product / Portfolio Work
Status Summary:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Products:
None
Hierarchy Progress Bar:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
None
PX Impact Score:
PX Impact Range:
None
PX Priority Data:
None
PX Technical Impact:
None
PX Technical Impact Notes:
None
PX Scheduling Request:
None

Problem:
In OpenShift clusters running for extended periods (e.g., over 1 or 2 years) without restarts, the Vector collector pods do not automatically reload rotated TLS certificates used for the Prometheus metrics endpoint.
Even if the cluster CA or service certificates are rotated automatically, the running Vector retains the old, expired certificate in memory. This results in Prometheus scraping failures and triggers critical alerts, as seen in the following:

Get "https://<pod-ip>:<port>/metrics": tls: failed to verify certificate: x509: certificate has expired

"alertname": "CollectorNodeDown",
"message": "Prometheus could not scrape openshift-logging/collector-xxxxx collector component for more than 10m."

Customer Impact:

Administrators are forced to manually restart collector pods to refresh certificates and resolve alerts.
Monitoring of the logging infrastructure is lost until manual intervention occurs.
While regular cluster upgrades would recreate pods and avoid certificate expiration issues, in mission-critical systems (e.g., Banking) utilizing EUS versions (up to 4 years), frequent maintenance or restarts just to refresh certificates are not ideal.

Requested Enhancement:
We need a dynamic certificate reloading mechanism for the OpenShift Logging Vector collector. The Vector should be able to detect file changes for the metrics certificates and reload them without requiring a full pod restart. Alternatively, the Operator should handle the certificate rotation lifecycle more gracefully to ensure no manual intervention is required for long-running clusters.

Assignee:: Jamie Parker

Reporter:: Yuki Okada

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2026/02/18 7:32 AM

Updated:: 2026/02/18 9:50 AM

Target start:: None

Target end:: None

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates