Details
-
Bug
-
Resolution: Done
-
Major
-
OSSM 2.2.2
-
None
Description
Federation controller uses multiple informers to fetch Kubernetes objects, but it does not check HasSynced() on all of them. This causes race condition - informers may be invoked while they are not ready and then object processing fails.
Steps to reproduce for QE:
1. Deploy 2 service meshes.
2. Federate CA certificates and apply ServiceMeshPeers.
3. Restart istiod containers.
Istiods should start federation controllers successfully without errors like in the log below.
Original description:
The following error appears in istiod logs after istiod has been restarted:
# oc logs -f istiod-east-mesh-5bc7974588-xjxcg | grep -i "error processing" 2022-12-21T10:08:29.180591Z error federation Error processing remote-east-mesh-system/west-mesh (will retry): could not get root cert for mesh west-mesh: error getting configmap west-ca-root-cert in namespace remote-east-mesh-system: configmap "west-ca-root-cert" not found component=federation-discovery-controller 2022-12-21T10:08:29.186240Z error federation Error processing remote-east-mesh-system/west-mesh (will retry): could not get root cert for mesh west-mesh: error getting configmap west-ca-root-cert in namespace remote-east-mesh-system: configmap "west-ca-root-cert" not found component=federation-discovery-controller 2022-12-21T10:08:29.197127Z error federation Error processing remote-east-mesh-system/west-mesh (will retry): could not get root cert for mesh west-mesh: error getting configmap west-ca-root-cert in namespace remote-east-mesh-system: configmap "west-ca-root-cert" not found component=federation-discovery-controller 2022-12-21T10:08:29.217542Z error federation Error processing remote-east-mesh-system/west-mesh (will retry): could not get root cert for mesh west-mesh: error getting configmap west-ca-root-cert in namespace remote-east-mesh-system: configmap "west-ca-root-cert" not found component=federation-discovery-controller 2022-12-21T10:08:29.257752Z error federation Error processing remote-east-mesh-system/west-mesh (will retry): could not get root cert for mesh west-mesh: error getting configmap west-ca-root-cert in namespace remote-east-mesh-system: configmap "west-ca-root-cert" not found component=federation-discovery-controller 2022-12-21T10:08:29.337875Z error federation Error processing remote-east-mesh-system/west-mesh (giving up): could not get root cert for mesh west-mesh: error getting configmap west-ca-root-cert in namespace remote-east-mesh-system: configmap "west-ca-root-cert" not found component=federation-discovery-controller
And shortly after that the communication with the other servicemeshpeer is interrupted.
The configmap exists, and the communication was working correctly to the other peer.
# oc version Client Version: 4.11.18 Kustomize Version: v4.5.4 Server Version: 4.11.20 Kubernetes Version: v1.24.6+5658434 # oc get smcp -A -o wide NAMESPACE NAME READY STATUS PROFILES VERSION AGE IMAGE REGISTRY remote-east-mesh-system east-mesh 10/10 ComponentsReady ["default"] 2.2.4 45h # oc get csv NAME DISPLAY VERSION REPLACES PHASE elasticsearch-operator.5.5.5 OpenShift Elasticsearch Operator 5.5.5 Succeeded jaeger-operator.v1.39.0-3 Red Hat OpenShift distributed tracing platform 1.39.0-3 jaeger-operator.v1.34.1-5 Succeeded kiali-operator.v1.57.3 Kiali Operator 1.57.3 kiali-operator.v1.48.3 Succeeded servicemeshoperator.v2.3.0 Red Hat OpenShift Service Mesh 2.3.0-0 servicemeshoperator.v2.2.3 Succeeded
Attaching the must-gather from the mesh where the issue happened.
Attachments
Issue Links
- is incorporated by
-
OSSM-2488 Container release for Maistra 2.2.5
-
- Closed
-
- mentioned on