-
Bug
-
Resolution: Done
-
Blocker
-
1.33.0
-
None
-
None
All tests fail at the moment.
Debugging Info/Status:
I isolated one of the tests `same-tenant-via-ingress-no-activator-proxy` and tested on OCP 4.12.54 with SM 2.4.5. I reverted to SM 2.4.5 because at the same time that that tests started failing on the CI, SM 2.5.0 was announced (in our S-O tests by default we use the latest SM version).
With the failing setup it seems that `same-tenant-via-ingress-no-activator.tenant-1.svc.cluster.local` envoy cluster config is broken when reached via the knative local ingress.
$ oc exec -it same-tenant-via-ingress-no-activator-proxy-00002-deploymenj5mjh -n tenant-1 -- curl -k -H "HOST: same-tenant-via-ingress-no-activator.tenant-1.svc.cluster.local" knative-local-gateway.istio-system.svc.cluster.local upstream connect error or disconnect/reset before headers. reset reason: connection termination $ oc exec -it same-tenant-via-ingress-no-activator-proxy-00002-deploymenj5mjh -n tenant-1 -- curl -k -H "HOST: same-tenant-via-ingress-no-activator.tenant-1.svc" knative-local-gateway.istio-system.svc.cluster.local Hello World!
On 1.32.1 both of the above work.
Looking at the requests that go through istio ingress gateway for the good and the bad setup we have:
good:
2024-04-08T21:07:48.969338839Z { "authority": "same-tenant-via-ingress-no-activator-proxy-tenant-1.apps.ci-ln-3vh2v6t-76ef8.origin-ci-int-aws.dev.rhcloud.com", "bytes_received": 0, "bytes_sent": 13, "downstream_local_address": "10.128.2.35:8443", "downstream_peer_cert_v_end": "-", "downstream_peer_cert_v_start": "-", "downstream_remote_address": "10.131.0.7:49600", "downstream_tls_cipher": "TLS_AES_256_GCM_SHA384", "downstream_tls_version": "TLSv1.3", "duration": 14, "hostname": "istio-ingressgateway-b479cbf84-6v44x", "istio_policy_status": "-", "method": "GET", "path": "/", "protocol": "HTTP/1.1", "request_duration": 0, "request_id": "ff04f647-0dc0-4edd-a384-6dc9209c169b", "requested_server_name": "same-tenant-via-ingress-no-activator-proxy-tenant-1.apps.ci-ln-3vh2v6t-76ef8.origin-ci-int-aws.dev.rhcloud.com", "response_code": "200", "response_duration": 14, "response_tx_duration": 0, "response_flags": "-", "route_name": "-", "start_time": "2024-04-08T21:07:48.608Z", "upstream_cluster": "outbound|80||same-tenant-via-ingress-no-activator-proxy-00001.tenant-1.svc.cluster.local", "upstream_host": "10.128.4.32:8012", "upstream_local_address": "10.128.2.35:49754", "upstream_service_time": 14, "upstream_transport_failure_reason": "-", "user_agent": "Go-http-client/1.1", "x_forwarded_for": "10.131.0.7" }
bad:
2024-04-09T11:48:24.665015358Z { "authority": "same-tenant-via-ingress-no-activator-proxy-tenant-1.apps.ci-ln-1tvfi8t-76ef8.aws-2.ci.openshift.org", "bytes_received": 0, "bytes_sent": 95, "downstream_local_address": "10.128.2.30:8443", "downstream_peer_cert_v_end": "-", "downstream_peer_cert_v_start": "-", "downstream_remote_address": "10.128.2.9:58848", "downstream_tls_cipher": "TLS_AES_256_GCM_SHA384", "downstream_tls_version": "TLSv1.3", "duration": 12, "hostname": "istio-ingressgateway-b479cbf84-5b9xx", "istio_policy_status": "-", "method": "GET", "path": "/", "protocol": "HTTP/1.1", "request_duration": 0, "request_id": "445385f3-60ec-4c89-a743-02566e54813a", "requested_server_name": "same-tenant-via-ingress-no-activator-proxy-tenant-1.apps.ci-ln-1tvfi8t-76ef8.aws-2.ci.openshift.org", "response_code": "503", "response_duration": 12, "response_tx_duration": 0, "response_flags": "-", "route_name": "-", "start_time": "2024-04-09T11:48:23.814Z", "upstream_cluster": "outbound|80||same-tenant-via-ingress-no-activator-proxy-00001.tenant-1.svc.cluster.local", "upstream_host": "10.130.2.29:8012", "upstream_local_address": "10.128.2.30:51092", "upstream_service_time": 11, "upstream_transport_failure_reason": "-", "user_agent": "Go-http-client/1.1", "x_forwarded_for": "10.128.2.9" }
Looking at the istio proxy config dumps, the working setup has this extra cluster set:
{ "name": "same-tenant-via-ingress-no-activator.tenant-1.svc.cluster.local:80", "domains": [ "same-tenant-via-ingress-no-activator.tenant-1.svc.cluster.local", "same-tenant-via-ingress-no-activator" ], "routes": [ { "match": { "prefix": "/", "case_sensitive": true, "headers": [ { "name": "K-Network-Hash", "string_match": { "exact": "override" } }, { "name": ":authority", "string_match": { "prefix": "same-tenant-via-ingress-no-activator.tenant-1" } } ] }, "route": { "cluster": "outbound|80||same-tenant-via-ingress-no-activator-00001.tenant-1.svc.cluster.local", "timeout": "0s", "max_grpc_timeout": "0s" }, "metadata": { "filter_metadata": { "istio": { "config": "/apis/networking.istio.io/v1alpha3/namespaces/tenant-1/virtual-service/same-tenant-via-ingress-no-activator-mesh" } } }, "decorator": { "operation": "same-tenant-via-ingress-no-activator-00001.tenant-1.svc.cluster.local:80/*" }, "request_headers_to_add": [ { "header": { "key": "K-Network-Hash", "value": "646d15912fbd0e991d2fb8549f5e1806eff0c74587280a2b807ca6a552cab2a2" }, "append": false }, { "header": { "key": "Knative-Serving-Namespace", "value": "tenant-1" }, "append": false }, { "header": { "key": "Knative-Serving-Revision", "value": "same-tenant-via-ingress-no-activator-00001" }, "append": false } ] }, { "match": { "prefix": "/", "case_sensitive": true, "headers": [ { "name": ":authority", "string_match": { "prefix": "same-tenant-via-ingress-no-activator.tenant-1" } } ] }, "route": { "cluster": "outbound|80||same-tenant-via-ingress-no-activator-00001.tenant-1.svc.cluster.local", "timeout": "0s", "max_grpc_timeout": "0s" }, "metadata": { "filter_metadata": { "istio": { "config": "/apis/networking.istio.io/v1alpha3/namespaces/tenant-1/virtual-service/same-tenant-via-ingress-no-activator-mesh" } } }, "decorator": { "operation": "same-tenant-via-ingress-no-activator-00001.tenant-1.svc.cluster.local:80/*" }, "request_headers_to_add": [ { "header": { "key": "Knative-Serving-Namespace", "value": "tenant-1" }, "append": false }, { "header": { "key": "Knative-Serving-Revision", "value": "same-tenant-via-ingress-no-activator-00001" }, "append": false } ] } ], "include_request_attempt_count": true },
Another thing I see for both setups that could be another jira is:
2024-04-08T20:39:03.434029476Z 2024-04-08T20:39:03.433976Z error ior failed to process gateway knative-local-gateway/knative-serving event add: 1 error occurred: 2024-04-08T20:39:03.434029476Z * error creating a route for the host * from gateway: knative-serving/knative-local-gateway: Route.route.openshift.io "knative-serving-knative-local-gateway-684888c0ebb17f37" is invalid: spec.host: Invalid value: "knative-serving-knative-local-gateway-684888c0ebb17f37-istio-system.apps.ci-ln-3vh2v6t-76ef8.origin-ci-int-aws.dev.rhcloud.com": must be no more than 63 characters
Also via Kiali I spotted
(https://kiali.io/docs/features/validations/#kia0602---port-appprotocol-must-follow-protocol-form):
bad service:
spec: externalName: knative-local-gateway.istio-system.svc.cluster.local ports: - appProtocol: kubernetes.io/h2c name: http2 port: 80 protocol: TCP targetPort: 80 sessionAffinity: None type: ExternalName status: loadBalancer: {}
good service:
spec: externalName: knative-local-gateway.istio-system.svc.cluster.local ports: - name: http2 port: 80 protocol: TCP targetPort: 80 sessionAffinity: None type: ExternalName
By removing `appProtocol: kubernetes.io/h2c` I verified manually that the bug is resolved, the curl commands above work:
oc exec -it same-tenant-via-ingress-no-activator-00001-deployment-65ccpggp5 -n tenant-1 -- curl -vvv -H "HOST: same-tenant-via-ingress-no-activator.tenant-1.svc.cluster.local" knative-local-gateway.istio-system.svc.cluster.local * Rebuilt URL to: knative-local-gateway.istio-system.svc.cluster.local/ * Trying 172.30.244.249... * TCP_NODELAY set * Connected to knative-local-gateway.istio-system.svc.cluster.local (172.30.244.249) port 80 (#0) > GET / HTTP/1.1 > Host: same-tenant-via-ingress-no-activator.tenant-1.svc.cluster.local > User-Agent: curl/7.61.1 > Accept: */* > < HTTP/1.1 200 OK < content-length: 13 < content-type: text/plain; charset=utf-8 < date: Wed, 10 Apr 2024 10:36:46 GMT < x-envoy-upstream-service-time: 11 < server: envoy < Hello World! * Connection #0 to host knative-local-gateway.istio-system.svc.cluster.local left intact $ curl -k https://same-tenant-via-ingress-no-activator-proxy-tenant-1.apps.ci-ln-f455vj2-76ef8.aws-2.ci.openshift.org Hello World!
Slack thread: https://redhat-internal.slack.com/archives/CHTTRCUBC/p1712308932800929