Uploaded image for project: 'Knative Serving'
  1. Knative Serving
  2. SRVKS-1232

MT tests fail on S-O main

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Blocker Blocker
    • 1.33.0
    • 1.33.0
    • None
    • None
    • Critical

      All tests fail at the moment.

      Debugging Info/Status:

      I isolated one of the tests `same-tenant-via-ingress-no-activator-proxy` and tested on OCP 4.12.54 with SM 2.4.5. I reverted to SM 2.4.5 because at the same time that that tests started failing on the CI, SM 2.5.0 was announced (in our S-O tests by default we use the latest SM version).

      With the failing setup it seems that `same-tenant-via-ingress-no-activator.tenant-1.svc.cluster.local` envoy cluster config is broken when reached via the knative local ingress.

      $ oc exec -it same-tenant-via-ingress-no-activator-proxy-00002-deploymenj5mjh -n tenant-1 -- curl -k -H "HOST: same-tenant-via-ingress-no-activator.tenant-1.svc.cluster.local" knative-local-gateway.istio-system.svc.cluster.local upstream connect error or disconnect/reset before headers. reset reason: connection termination
      
      $ oc exec -it same-tenant-via-ingress-no-activator-proxy-00002-deploymenj5mjh -n tenant-1 -- curl -k -H "HOST: same-tenant-via-ingress-no-activator.tenant-1.svc" knative-local-gateway.istio-system.svc.cluster.local Hello World!

      On 1.32.1 both of the above work.

      Looking at the requests that go through istio ingress gateway for the good and the bad setup we have:

      good:

       2024-04-08T21:07:48.969338839Z { "authority": "same-tenant-via-ingress-no-activator-proxy-tenant-1.apps.ci-ln-3vh2v6t-76ef8.origin-ci-int-aws.dev.rhcloud.com", "bytes_received": 0, "bytes_sent": 13, "downstream_local_address": "10.128.2.35:8443", "downstream_peer_cert_v_end": "-", "downstream_peer_cert_v_start": "-", "downstream_remote_address": "10.131.0.7:49600", "downstream_tls_cipher": "TLS_AES_256_GCM_SHA384", "downstream_tls_version": "TLSv1.3", "duration": 14, "hostname": "istio-ingressgateway-b479cbf84-6v44x", "istio_policy_status": "-", "method": "GET", "path": "/", "protocol": "HTTP/1.1", "request_duration": 0, "request_id": "ff04f647-0dc0-4edd-a384-6dc9209c169b", "requested_server_name": "same-tenant-via-ingress-no-activator-proxy-tenant-1.apps.ci-ln-3vh2v6t-76ef8.origin-ci-int-aws.dev.rhcloud.com", "response_code": "200", "response_duration": 14, "response_tx_duration": 0, "response_flags": "-", "route_name": "-", "start_time": "2024-04-08T21:07:48.608Z", "upstream_cluster": "outbound|80||same-tenant-via-ingress-no-activator-proxy-00001.tenant-1.svc.cluster.local", "upstream_host": "10.128.4.32:8012", "upstream_local_address": "10.128.2.35:49754", "upstream_service_time": 14, "upstream_transport_failure_reason": "-", "user_agent": "Go-http-client/1.1", "x_forwarded_for": "10.131.0.7"

       

      bad:

      2024-04-09T11:48:24.665015358Z { "authority": "same-tenant-via-ingress-no-activator-proxy-tenant-1.apps.ci-ln-1tvfi8t-76ef8.aws-2.ci.openshift.org", "bytes_received": 0, "bytes_sent": 95, "downstream_local_address": "10.128.2.30:8443", "downstream_peer_cert_v_end": "-", "downstream_peer_cert_v_start": "-", "downstream_remote_address": "10.128.2.9:58848", "downstream_tls_cipher": "TLS_AES_256_GCM_SHA384", "downstream_tls_version": "TLSv1.3", "duration": 12, "hostname": "istio-ingressgateway-b479cbf84-5b9xx", "istio_policy_status": "-", "method": "GET", "path": "/", "protocol": "HTTP/1.1", "request_duration": 0, "request_id": "445385f3-60ec-4c89-a743-02566e54813a", "requested_server_name": "same-tenant-via-ingress-no-activator-proxy-tenant-1.apps.ci-ln-1tvfi8t-76ef8.aws-2.ci.openshift.org", "response_code": "503", "response_duration": 12, "response_tx_duration": 0, "response_flags": "-", "route_name": "-", "start_time": "2024-04-09T11:48:23.814Z", "upstream_cluster": "outbound|80||same-tenant-via-ingress-no-activator-proxy-00001.tenant-1.svc.cluster.local", "upstream_host": "10.130.2.29:8012", "upstream_local_address": "10.128.2.30:51092", "upstream_service_time": 11, "upstream_transport_failure_reason": "-", "user_agent": "Go-http-client/1.1", "x_forwarded_for": "10.128.2.9"

       

      Looking at the istio proxy config dumps, the working setup has this extra cluster set:

             {
              "name": "same-tenant-via-ingress-no-activator.tenant-1.svc.cluster.local:80",
              "domains": [
               "same-tenant-via-ingress-no-activator.tenant-1.svc.cluster.local",
               "same-tenant-via-ingress-no-activator"
              ],
              "routes": [
               {
                "match": {
                 "prefix": "/",
                 "case_sensitive": true,
                 "headers": [
                  {
                   "name": "K-Network-Hash",
                   "string_match": {
                    "exact": "override"
                   }
                  },
                  {
                   "name": ":authority",
                   "string_match": {
                    "prefix": "same-tenant-via-ingress-no-activator.tenant-1"
                   }
                  }
                 ]
                },
                "route": {
                 "cluster": "outbound|80||same-tenant-via-ingress-no-activator-00001.tenant-1.svc.cluster.local",
                 "timeout": "0s",
                 "max_grpc_timeout": "0s"
                },
                "metadata": {
                 "filter_metadata": {
                  "istio": {
                   "config": "/apis/networking.istio.io/v1alpha3/namespaces/tenant-1/virtual-service/same-tenant-via-ingress-no-activator-mesh"
                  }
                 }
                },
                "decorator": {
                 "operation": "same-tenant-via-ingress-no-activator-00001.tenant-1.svc.cluster.local:80/*"
                },
                "request_headers_to_add": [
                 {
                  "header": {
                   "key": "K-Network-Hash",
                   "value": "646d15912fbd0e991d2fb8549f5e1806eff0c74587280a2b807ca6a552cab2a2"
                  },
                  "append": false
                 },
                 {
                  "header": {
                   "key": "Knative-Serving-Namespace",
                   "value": "tenant-1"
                  },
                  "append": false
                 },
                 {
                  "header": {
                   "key": "Knative-Serving-Revision",
                   "value": "same-tenant-via-ingress-no-activator-00001"
                  },
                  "append": false
                 }
                ]
               },
               {
                "match": {
                 "prefix": "/",
                 "case_sensitive": true,
                 "headers": [
                  {
                   "name": ":authority",
                   "string_match": {
                    "prefix": "same-tenant-via-ingress-no-activator.tenant-1"
                   }
                  }
                 ]
                },
                "route": {
                 "cluster": "outbound|80||same-tenant-via-ingress-no-activator-00001.tenant-1.svc.cluster.local",
                 "timeout": "0s",
                 "max_grpc_timeout": "0s"
                },
                "metadata": {
                 "filter_metadata": {
                  "istio": {
                   "config": "/apis/networking.istio.io/v1alpha3/namespaces/tenant-1/virtual-service/same-tenant-via-ingress-no-activator-mesh"
                  }
                 }
                },
                "decorator": {
                 "operation": "same-tenant-via-ingress-no-activator-00001.tenant-1.svc.cluster.local:80/*"
                },
                "request_headers_to_add": [
                 {
                  "header": {
                   "key": "Knative-Serving-Namespace",
                   "value": "tenant-1"
                  },
                  "append": false
                 },
                 {
                  "header": {
                   "key": "Knative-Serving-Revision",
                   "value": "same-tenant-via-ingress-no-activator-00001"
                  },
                  "append": false
                 }
                ]
               }
              ],
              "include_request_attempt_count": true
             },

       

      Another thing I see for both setups that could be another jira is:

      2024-04-08T20:39:03.434029476Z 2024-04-08T20:39:03.433976Z	error	ior	failed to process gateway knative-local-gateway/knative-serving event add: 1 error occurred:
      2024-04-08T20:39:03.434029476Z 	* error creating a route for the host * from gateway: knative-serving/knative-local-gateway: Route.route.openshift.io "knative-serving-knative-local-gateway-684888c0ebb17f37" is invalid: spec.host: Invalid value: "knative-serving-knative-local-gateway-684888c0ebb17f37-istio-system.apps.ci-ln-3vh2v6t-76ef8.origin-ci-int-aws.dev.rhcloud.com": must be no more than 63 characters 

       

      Also via Kiali I spotted

      (https://kiali.io/docs/features/validations/#kia0602---port-appprotocol-must-follow-protocol-form):

      bad service:

        spec:
          externalName: knative-local-gateway.istio-system.svc.cluster.local
          ports:
          - appProtocol: kubernetes.io/h2c
            name: http2
            port: 80
            protocol: TCP
            targetPort: 80
          sessionAffinity: None
          type: ExternalName
        status:
          loadBalancer: {}
      
      

       

      good service:

        spec:
          externalName: knative-local-gateway.istio-system.svc.cluster.local
          ports:
          - name: http2
            port: 80
            protocol: TCP
            targetPort: 80
          sessionAffinity: None
          type: ExternalName 

      By removing `appProtocol: kubernetes.io/h2c` I verified manually that the bug is resolved, the curl commands above work:

      oc exec -it same-tenant-via-ingress-no-activator-00001-deployment-65ccpggp5         -n tenant-1 -- curl -vvv -H "HOST: same-tenant-via-ingress-no-activator.tenant-1.svc.cluster.local" knative-local-gateway.istio-system.svc.cluster.local
      * Rebuilt URL to: knative-local-gateway.istio-system.svc.cluster.local/
      *   Trying 172.30.244.249...
      * TCP_NODELAY set
      * Connected to knative-local-gateway.istio-system.svc.cluster.local (172.30.244.249) port 80 (#0)
      > GET / HTTP/1.1
      > Host: same-tenant-via-ingress-no-activator.tenant-1.svc.cluster.local
      > User-Agent: curl/7.61.1
      > Accept: */*
      > 
      < HTTP/1.1 200 OK
      < content-length: 13
      < content-type: text/plain; charset=utf-8
      < date: Wed, 10 Apr 2024 10:36:46 GMT
      < x-envoy-upstream-service-time: 11
      < server: envoy
      < 
      Hello World!
      * Connection #0 to host knative-local-gateway.istio-system.svc.cluster.local left intact
      
      $ curl -k https://same-tenant-via-ingress-no-activator-proxy-tenant-1.apps.ci-ln-f455vj2-76ef8.aws-2.ci.openshift.org   
      Hello World!
        

      Slack thread: https://redhat-internal.slack.com/archives/CHTTRCUBC/p1712308932800929

              skontopo@redhat.com Stavros Kontopoulos
              skontopo@redhat.com Stavros Kontopoulos
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: