-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.19.0
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem
The HyperShift Operator (HO) metrics collector is validating proxy CA bundle certificates, which causes repeated error messages in the logs when certificates become invalid. This validation is happening in the wrong place.
During an incident call (4/18), this error message was repeated continuously in the HO logs:
{"level":"info","ts":"2025-04-17T13:13:29Z","msg":"proxy ca bundle is invalid, due to erroring while validating","error":"a configured certificate in the ca bundle was no longer valid: stamp2.login.microsoftonline.com"}
Issues with current behavior:
- The error message does not include which HostedCluster (HC) is having the issue, making troubleshooting difficult
- Certificate validation should not be performed in the metrics collector
- Certificate validation should be done in the HostedCluster reconcile loop and reported via conditions
Version-Release number
HyperShift Operator (version observed during 4/17/2025 incident)
How reproducible
Always (when a proxy CA bundle contains an invalid/expired certificate)
Steps to Reproduce
- Configure a HostedCluster with a proxy that has a CA bundle
- Allow one of the certificates in the CA bundle to expire or become invalid
- Observe HO logs for metrics collector errors
Actual results
- The metrics collector logs repeated "proxy ca bundle is invalid" errors
- Error messages do not identify which HostedCluster is affected
- No HostedCluster condition is set to indicate the problem
Expected results
- The metrics collector should NOT validate proxy certificates
- Certificate validation should occur in the HostedCluster reconcile loop
- Invalid certificates should be reported via HostedCluster conditions (making it easy to identify affected clusters)
- Error messages should include the HostedCluster name/namespace for troubleshooting
Additional info
Genesis: Slack discussion
Recommendation from Cesar: Move certificate validation to HC reconcile and report via conditions instead of validating in the metrics collector.