Uploaded image for project: 'Cluster Observability Operator'
  1. Cluster Observability Operator
  2. COO-1062

Incident Detection Installation Fails Intermittently

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Critical Critical
    • 1.2.2
    • 1.2.1
    • operator
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      The installation of the Incident Detection feature may intermittently fail. The issue occurs unpredictably, depending on the order in which Kubernetes manifests are applied.

      Symptom

      • The Incident Detection UI is visible but does not display any data.
      • The health-analyzer ServiceMonitor is down, showing the following error message:
      tls: failed to verify certificate: x509

      Workaround

      Patch the incorrectly annotated service, then delete the secret and the service to allow them to be recreated by COO:

      kubectl patch service health-analyzer-mcp --type=json -p='[{"op": "remove", "path": "/metadata/annotations/service.beta.openshift.io~1serving-cert-secret-name"},{"op": "remove", "path": "/metadata/annotations/service.alpha.openshift.io~1serving-cert-signed-by"},{"op": "remove", "path": "/metadata/annotations/service.beta.openshift.io~1serving-cert-signed-by"}]'
      kubectl delete secret -n openshift-cluster-observability-operator health-analyzer-tls
      kubectl delete service -n openshift-cluster-observability-operator health-analyzer 

       

              tremes1@redhat.com Tomas Remes
              afalossi@redhat.com Alberto Falossi
              None
              Alberto Falossi
              None
              Savitha T Jose Savitha T Jose
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: