-
Bug
-
Resolution: Done
-
Major
-
maistra-1.0.0
-
None
-
MAISTRA 1.0.2
This is a tracking bug for https://jira.coreos.com/browse/SRVKS-213. The serverless team considers this a blocker and is trying to reproduce the issue in a cluster that can be debugged. It doesn't seem like this requires anything Knative-specific at this point, so it should happen for other service mesh customers. The following description is copied from that issue:
Knative Service starts to return the 503 status code when the cluster is running for a longer time. The duration is random.
The Knative service and its route shows "Ready" but sending an HTTP request to the route returns 503. Knative Serving pods do not show any errors.
Restarting the istio-pilot pod fixes the problem. The relevant part of istio-pilot logs between the moment when the service was still available and when it started to return 503:
2019-08-13T05:27:03.479786Z info ServiceMeshMemberRoll default updated, namespaces now ["serving-tests" "knative-serving" "knative-eventing" "knative-build" "istio-system"]2019-08-13T05:27:03.479786Z info ServiceMeshMemberRoll default updated, namespaces now ["serving-tests" "knative-serving" "knative-eventing" "knative-build" "istio-system"]2019-08-13T05:27:03.480073Z warn istio.io/istio/pilot/pkg/serviceregistry/kube/controller.go:353: watch of *v1.Pod ended with: Namespaces Updated2019-08-13T05:27:03.480132Z info ServiceMeshMemberRoll default updated, namespaces now ["serving-tests" "knative-serving" "knative-eventing" "knative-build" "istio-system"]2019-08-13T05:27:03.480305Z warn istio.io/istio/pilot/pkg/serviceregistry/kube/controller.go:352: watch of *v1.Service ended with: Namespaces Updated2019-08-13T05:27:03.480330Z info ServiceMeshMemberRoll default updated, namespaces now ["serving-tests" "knative-serving" "knative-eventing" "knative-build" "istio-system"]2019-08-13T05:27:03.480390Z warn istio.io/istio/pilot/pkg/serviceregistry/kube/controller.go:360: watch of *v1.Endpoints ended with: Namespaces Updated2019-08-13T05:27:04.892061Z error istio.io/istio/pilot/pkg/serviceregistry/kube/controller.go:360: Failed to watch *v1.Endpoints: unknown (get endpoints)2019-08-13T05:27:05.090648Z error istio.io/istio/pilot/pkg/serviceregistry/kube/controller.go:352: Failed to watch *v1.Service: unknown (get services)2019-08-13T05:27:05.486557Z info Handling event update for pod autoscale-up-down-up-zzkjgjxz-k9m6z-deployment-6879974c-qs8xd in namespace serving-tests -> 10.131.2.312019-08-13T05:27:05.486603Z info Handling event update for pod autoscale-up-down-up-zzkjgjxz-k9m6z-deployment-6879974c-rpxpg in namespace serving-tests -> 10.128.4.212019-08-13T05:27:05.488367Z error istio.io/istio/pilot/pkg/serviceregistry/kube/controller.go:353: Failed to watch *v1.Pod: unknown (get pods)2019-08-13T05:28:12.783748Z info ServiceMeshMemberRoll default updated, namespaces now ["serving-tests" "knative-serving" "knative-eventing" "knative-build" "serving-tests-alt" "istio-system"]2019-08-13T05:28:12.783793Z info ServiceMeshMemberRoll default updated, namespaces now ["serving-tests" "knative-serving" "knative-eventing" "knative-build" "serving-tests-alt" "istio-system"]2019-08-13T05:28:12.783829Z info ServiceMeshMemberRoll default updated, namespaces now ["serving-tests" "knative-serving" "knative-eventing" "knative-build" "serving-tests-alt" "istio-system"]2019-08-13T05:30:57.286093Z warn istio.io/istio/pkg/kube/secretcontroller/secretcontroller.go:148: watch of *v1.Secret ended with: too old resource version: 255540 (288281)2019-08-13T05:31:34.552424Z info ads Push debounce stable[459] 1: 100.162531ms since last change, 100.162531ms since last push, full=true2019-08-13T05:31:34.552915Z info ads XDS: Pushing 2019-08-13T05:31:34Z/406 Services: 9, ConnectedEndpoints: 22019-08-13T05:31:34.553474Z info ads Cluster init time 541.857µs 2019-08-13T05:31:34Z/4062019-08-13T05:31:34.553581Z info ads Pushing router~10.131.2.9~istio-ingressgateway-bc97545d5-srx97.istio-system~istio-system.svc.cluster.local-532019-08-13T05:31:34.553589Z info ads PushAll done 2019-08-13T05:31:34Z/406 85.984µs2019-08-13T05:31:34.553664Z info ads Pushing router~10.128.2.7~cluster-local-gateway-67c8dc578f-mxfrj.istio-system~istio-system.svc.cluster.local-542019-08-13T05:31:34.553946Z info ads CDS: PUSH 2019-08-13T05:31:34Z/406 for router~10.128.2.7~cluster-local-gateway-67c8dc578f-mxfrj.istio-system~istio-system.svc.cluster.local-54 "10.128.2.7:34940", Clusters: 26, Services 92019-08-13T05:31:34.554017Z info ads CDS: PUSH 2019-08-13T05:31:34Z/406 for router~10.131.2.9~istio-ingressgateway-bc97545d5-srx97.istio-system~istio-system.svc.cluster.local-53 "10.131.2.9:44790", Clusters: 50, Services 92019-08-13T05:31:34.555144Z info ads LDS: PUSH for node:cluster-local-gateway-67c8dc578f-mxfrj.istio-system addr:"10.128.2.7:34940" listeners:1 9132019-08-13T05:31:34.555162Z info 1 error occurred: * gateway omitting listener "0.0.0.0_443" due to: must have more than 0 chains in listener: &v2.Listener{Name:"0.0.0.0_443", Address:core.Address{Address:(*core.Address_SocketAddress)(0xc000bf8f30), XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}, FilterChains:[]listener.FilterChain{}, UseOriginalDst:nil, PerConnectionBufferLimitBytes:nil, Metadata:(*core.Metadata)(nil), DeprecatedV1:(*v2.Listener_DeprecatedV1)(nil), DrainType:0, ListenerFilters:[]listener.ListenerFilter(nil), ListenerFiltersTimeout:(*time.Duration)(nil), Transparent:nil, Freebind:nil, SocketOptions:[]*core.SocketOption(nil), TcpFastOpenQueueLength:nil, XXX_NoUnkeyedLiteral:struct {}{}, XXX_unrecognized:[]uint8(nil), XXX_sizecache:0}2019-08-13T05:31:34.555191Z warn constructed http route config for port 80 with no vhosts; Setting up a default 404 vhost2019-08-13T05:31:34.555240Z info ads ADS: RDS: PUSH for node: cluster-local-gateway-67c8dc578f-mxfrj.istio-system addr:10.128.2.7:34940 routes:12019-08-13T05:31:34.555246Z info ads LDS: PUSH for node:istio-ingressgateway-bc97545d5-srx97.istio-system addr:"10.131.2.9:44790" listeners:1 9132019-08-13T05:31:34.555534Z info ads ADS: RDS: PUSH for node: istio-ingressgateway-bc97545d5-srx97.istio-system addr:10.131.2.9:44790 routes:12019-08-13T05:31:34.555580Z info ads Push finished: 3.094323ms { "ProxyStatus": {}, "Start": "2019-08-13T05:31:34.552482092Z", "End": "2019-08-13T05:31:34.555548866Z"}
By the way, it was not showing "myproject" in ServiceMeshMemberRoll even though it was defined in the config. This led me to conclusion that something's wrong with the istio-pilot. Restarting the istio-pilot helped and the logs newly showed "myproject" being part of ServiceMeshMemberRoll.
The whole log is attached.
istio-ingressgateway's logs only show this single line every 30 minutes many times (both before and after the moment when service become unavailable) and don't seem to show anything useful:
[2019-08-13 07:35:19.641][18][warning][config] [bazel-out/k8-opt/bin/external/envoy/source/common/config/_virtual_includes/grpc_stream_lib/common/config/grpc_stream.h:86] gRPC config stream closed: 13,
Service Mesh configuration is minimal (sidecar injection, tracing etc. disabled) :
apiVersion: maistra.io/v1 kind: ServiceMeshControlPlane metadata: name: minimal-multitenant-cni-install spec: istio: global: multitenant: true proxy: # constrain resources for use in smaller environments resources: requests: cpu: 100m memory: 128Mi limits: cpu: 500m memory: 128Mi autoInject: disabled omitSidecarInjectorConfigMap: true disablePolicyChecks: false istio_cni: enabled: true gateways: istio-ingressgateway: autoscaleEnabled: false istio-egressgateway: enabled: false cluster-local-gateway: autoscaleEnabled: false enabled: true labels: app: cluster-local-gateway istio: cluster-local-gateway ports: - name: status-port port: 15020 - name: http2 port: 80 targetPort: 80 - name: https port: 443 mixer: enabled: false policy: enabled: false telemetry: enabled: false pilot: # disable autoscaling for use in smaller environments autoscaleEnabled: false sidecar: false kiali: enabled: false tracing: enabled: false prometheus: enabled: false grafana: enabled: false sidecarInjectorWebhook: enabled: false --- apiVersion: maistra.io/v1 kind: ServiceMeshMemberRoll metadata: name: default spec: members: - myproject - serving-tests - serving-tests-alt - knative-serving - knative-eventing - knative-build - test-api-server-source - test-broker-channel-flow - test-broker-channel-flow-crd-in-memory - test-broker-channel-flow-in-memory - test-channel-chain - test-channel-chain-crd-in-memory - test-channel-chain-in-memory - test-container-source - test-cron-job-source - test-default-broker-with-many-triggers - test-event-transformation-for-subscription - test-event-transformation-for-subscription-crd-in-memory - test-event-transformation-for-subscription-in-memory - test-event-transformation-for-trigger - test-event-transformation-for-trigger-crd-in-memory - test-event-transformation-for-trigger-in-memory - test-single-binary-event-for-channel - test-single-binary-event-for-channel-crd-in-memory - test-single-binary-event-for-channel-in-memory - test-single-structured-event-for-channel - test-single-structured-event-for-channel-crd-in-memory - test-single-structured-event-for-channel-in-memory
- is related to
-
MAISTRA-862 Galley can drop watches on Istio CRs
- Closed