The installation of the Incident Detection feature may intermittently fail. The issue occurs unpredictably, depending on the order in which Kubernetes manifests are applied.
Symptom
- The Incident Detection UI is visible but does not display any data.
- The health-analyzer ServiceMonitor is down, showing the following error message:
tls: failed to verify certificate: x509
Workaround
Patch the incorrectly annotated service, then delete the secret and the service to allow them to be recreated by COO:
kubectl patch service health-analyzer-mcp --type=json -p='[{"op": "remove", "path": "/metadata/annotations/service.beta.openshift.io~1serving-cert-secret-name"},{"op": "remove", "path": "/metadata/annotations/service.alpha.openshift.io~1serving-cert-signed-by"},{"op": "remove", "path": "/metadata/annotations/service.beta.openshift.io~1serving-cert-signed-by"}]' kubectl delete secret -n openshift-cluster-observability-operator health-analyzer-tls kubectl delete service -n openshift-cluster-observability-operator health-analyzer