-
Ticket
-
Resolution: Done
-
Minor
-
OSSM 2.3.1
-
None
-
False
-
None
-
False
-
-
What problem/issue/behavior are you having trouble with? What do you expect to see?
We run OSSM operator 2.3.1 on Openshift 4.12
We created a basic federation hello-world scenario very much like the one described in the doc (https://docs.openshift.com/container-platform/4.12/service_mesh/v2x/ossm-federation.html): red-mesh exporting a service to green-mesh via
{Exported,Imported}ServiceSet. When trying to target the red service from a green pod, however, we experienced upstream connectivity issues on roughly 50% of the queries. Digging a bit more, we found that the ip address of the red-federation ingress LoadBalancer service (exposed on the green openshift cluster) somehow ends up among the target endpoints of the imported service in the envoy config on the red-federation egress:
$ kubectl exec -ti egress-red-6d8f58bbcf-2wmr5 -n green-mesh-control-plane -- curl -s http://localhost:15000/clusters | grep exports.local | grep cx_active
outbound|80||svc-istio-test.red-workload.svc.green-exports.local::10.56.142.126:15443::cx_active::0 <-- ip of green-federation ingress LB service running exposed red cluster. This one is expected.
outbound|80||svc-istio-test.red-workload.svc.green-exports.local::10.56.142.131:15443::cx_active::0 <-- ip of red-federation ingress LB service exposed on green cluster. *This one is not expected.*
We suspect this is coming from the fact that the federation service discovery query on the red istiod returns both the red and the green gateway ips:
$ kubectl exec -ti istiod-red-749cd9cff9-xndtq -n red-mesh-control-plane -- curl --insecure https://localhost:8188/v1/services/green networkGatewayEndpoints":[{"port":15443,"hostname":"10.56.142.131"}, {"port":15443,"hostname":"10.56.142.126"}],
When we tried to activate federation:debug logs to debug this further, we ended up triggering a segfault in istiod:
panic: runtime error: invalid memory address or nil pointer dereference168[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2a3adc1]169170goroutine 189 [running]:171istio.io/istio/pkg/servicemesh/federation/server.(*meshServer).serviceUpdated(0xc001ee6000, 0xc002381540, 0x1)172/remote-source/istio/app/pkg/servicemesh/federation/server/server.go:629
probably caused by something like svcMessage == nil here https://github.com/maistra/istio/blob/maistra-2.3/pkg/servicemesh/federation/server/server.go#L628-L630
the funny thing is that in this situation istiod cores once and when it restarts it won’t return any networkGatewayEndpoints in the service discovery query anymore. As a consequence, only the green-federation ingress endpoint will be listed in the envoy config on the red-federation egress:
$ kubectl exec -ti egress-red-6d8f58bbcf-2wmr5 -n green-mesh-control-plane -- curl -s http://localhost:15000/clusters | grep exports.local | grep cx_active
outbound|80||svc-istio-test.red-workload.svc.green-exports.local::10.56.142.126:15443::cx_active::0
Is this a bug or expected behaviour?
What is the business impact? Please also provide timeframe information.
We are experimenting with OSSM with the intention of using it to potentially power new use-cases in production soon
Note this is also captured in ticket: https://access.redhat.com/support/cases/#/case/03436386
and doc https://docs.google.com/document/d/1673j_r6V1XXS-TNRsviCO9D_CT4nZQOmJRPz6h-TN8Q/edit
- is related to
-
OSSM-3599 Federation egress-gateway gets wrong update of network gateway endpoints
- Closed