Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-55151

metrics collector should not be validating proxy certificates

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem

      The HyperShift Operator (HO) metrics collector is validating proxy CA bundle certificates, which causes repeated error messages in the logs when certificates become invalid. This validation is happening in the wrong place.

      During an incident call (4/18), this error message was repeated continuously in the HO logs:

      {"level":"info","ts":"2025-04-17T13:13:29Z","msg":"proxy ca bundle is invalid, due to erroring while validating","error":"a configured certificate in the ca bundle was no longer valid: stamp2.login.microsoftonline.com"}
      

      Issues with current behavior:

      • The error message does not include which HostedCluster (HC) is having the issue, making troubleshooting difficult
      • Certificate validation should not be performed in the metrics collector
      • Certificate validation should be done in the HostedCluster reconcile loop and reported via conditions

      Version-Release number

      HyperShift Operator (version observed during 4/17/2025 incident)

      How reproducible

      Always (when a proxy CA bundle contains an invalid/expired certificate)

      Steps to Reproduce

      1. Configure a HostedCluster with a proxy that has a CA bundle
      2. Allow one of the certificates in the CA bundle to expire or become invalid
      3. Observe HO logs for metrics collector errors

      Actual results

      • The metrics collector logs repeated "proxy ca bundle is invalid" errors
      • Error messages do not identify which HostedCluster is affected
      • No HostedCluster condition is set to indicate the problem

      Expected results

      • The metrics collector should NOT validate proxy certificates
      • Certificate validation should occur in the HostedCluster reconcile loop
      • Invalid certificates should be reported via HostedCluster conditions (making it easy to identify affected clusters)
      • Error messages should include the HostedCluster name/namespace for troubleshooting

      Additional info

      Genesis: Slack discussion

      Recommendation from Cesar: Move certificate validation to HC reconcile and report via conditions instead of validating in the metrics collector.

              hypershift-automation hypershift-team automation
              rh-ee-brcox Bryan Cox
              None
              None
              Jie Zhao Jie Zhao
              None
              Bryan Cox
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: