Uploaded image for project: 'Knative Serving'
  1. Knative Serving
  2. SRVKS-983

Requests to Knative service dropped during upgrades

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • 1.30.0
    • 1.26.0
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      The ProbeTest which sends requests periodically to a Knative service during upgrade quite often fails with the following error:

      ⏳ 2022-11-22T13:14:30.000Z    INFO    test/prober.go:172    "https://upgrade-probe-serving-tests.apps.ci-ocp-4-11-amd64-aws-us-east-1-ndpsd.hive.aws.ci.openshift.org" status = 502, want: 200
      ⏳ 2022-11-22T13:14:30.001Z    INFO    test/prober.go:173    response: status: 502, body: dial tcp 127.0.0.1:8080: connect: connection refused
      ⏳ , headers: map[Content-Length:[53] Content-Type:[text/plain; charset=utf-8] Date:[Tue, 22 Nov 2022 13:14:29 GMT] Server:[envoy] X-Content-Type-Options:[nosniff] X-Envoy-Upstream-Service-Time:[8] Zipkin_trace_id:[66c4491342ab819f94645839a2e837b4]] 

      Example job run is [ here|https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift-knative_serverless-operator/1822/pull-ci-openshift-knative-serverless-operator-main-4.11-upgrade-tests-aws-ocp-411/1595025697598869504 ]

      It turns out this error happens during the upgrade-probe pod shutdown.

      When debugging with cluster logging enabled the the queue-proxy container in the Knative service pod has these logs:

      {"severity":"INFO","timestamp":"2022-11-23T07:43:36.944363685Z","logger":"queueproxy","caller":"queue/main.go:233","message":"Received TERM signal, attempting to gracefully shutdown servers.","knative.dev/key":"serving-tests/upgrade-probe-00001","knative.dev/pod":"upgrade-probe-00001-deployment-79886db75f-lx8xf"}
      {"severity":"INFO","timestamp":"2022-11-23T07:43:36.944435244Z","logger":"queueproxy","caller":"queue/main.go:234","message":"Sleeping 30s to allow K8s propagation of non-ready state","knative.dev/key":"serving-tests/upgrade-probe-00001","knative.dev/pod":"upgrade-probe-00001-deployment-79886db75f-lx8xf"}
      {"httpRequest": {"requestMethod": "GET", "requestUrl": "/", "requestSize": "0", "status": 200, "responseSize": "17", "userAgent": "Go-http-client/1.1", "remoteIp": "10.128.4.13:59976", "serverIp": "10.128.4.54", "referer": "", "latency": "0.000527796s", "protocol": "HTTP/1.1"}, "traceId": "[aa2c9d3da972d1d71f0eea1fa59677d8]"}
      {"httpRequest": {"requestMethod": "GET", "requestUrl": "/", "requestSize": "0", "status": 502, "responseSize": "53", "userAgent": "Go-http-client/1.1", "remoteIp": "10.130.2.14:41902", "serverIp": "10.128.4.54", "referer": "", "latency": "0.000520042s", "protocol": "HTTP/1.1"}, "traceId": "[37b41de5841892739b6dfba61b8e017b]"}
      {"severity":"ERROR","timestamp":"2022-11-23T07:43:36.98513673Z","logger":"queueproxy","caller":"network/error_handler.go:33","message":"error reverse proxying request; sockstat: sockets: used 9\nTCP: inuse 1 orphan 0 tw 16 alloc 751 mem 1763\nUDP: inuse 0 mem 6\nUDPLITE: inuse 0\nRAW: inuse 0\nFRAG: inuse 0 memory 0\n","knative.dev/key":"serving-tests/upgrade-probe-00001","knative.dev/pod":"upgrade-probe-00001-deployment-79886db75f-lx8xf","error":"dial tcp 127.0.0.1:8080: connect: connection refused","stacktrace":"knative.dev/pkg/network.ErrorHandler.func1\n\t/go/src/knative.dev/serving/vendor/knative.dev/pkg/network/error_handler.go:33\nnet/http/httputil.(*ReverseProxy).ServeHTTP\n\t/usr/local/go/src/net/http/httputil/reverseproxy.go:300\nknative.dev/serving/pkg/queue.ProxyHandler.func1\n\t/go/src/knative.dev/serving/pkg/queue/handler.go:75\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2047\nknative.dev/serving/pkg/queue.ForwardedShimHandler.func1\n\t/go/src/knative.dev/serving/pkg/queue/forwarded_shim.go:54\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2047\nknative.dev/serving/pkg/http/handler.(*timeoutHandler).ServeHTTP.func4\n\t/go/src/knative.dev/serving/pkg/http/handler/timeout.go:117"}
      {"httpRequest": {"requestMethod": "GET", "requestUrl": "/", "requestSize": "0", "status": 503, "responseSize": "14", "userAgent": "kube-probe/1.24", "remoteIp": "10.128.4.1:42542", "serverIp": "10.128.4.54", "referer": "", "latency": "1.7999e-05s", "protocol": "HTTP/1.1"}, "traceId": "[]"}
      {"httpRequest": {"requestMethod": "GET", "requestUrl": "/", "requestSize": "0", "status": 503, "responseSize": "14", "userAgent": "kube-probe/1.24", "remoteIp": "10.128.4.1:34060", "serverIp": "10.128.4.54", "referer": "", "latency": "1.0637e-05s", "protocol": "HTTP/1.1"}, "traceId": "[]"}
      {"httpRequest": {"requestMethod": "GET", "requestUrl": "/", "requestSize": "0", "status": 503, "responseSize": "14", "userAgent": "kube-probe/1.24", "remoteIp": "10.128.4.1:34320", "serverIp": "10.128.4.54", "referer": "", "latency": "9.076e-06s", "protocol": "HTTP/1.1"}, "traceId": "[]"}
      {"severity":"INFO","timestamp":"2022-11-23T07:44:06.945292872Z","logger":"queueproxy","caller":"queue/main.go:241","message":"Shutting down server: metrics","knative.dev/key":"serving-tests/upgrade-probe-00001","knative.dev/pod":"upgrade-probe-00001-deployment-79886db75f-lx8xf"}
      {"severity":"INFO","timestamp":"2022-11-23T07:44:06.94656192Z","logger":"queueproxy","caller":"queue/main.go:246","message":"Shutdown complete, exiting...","knative.dev/key":"serving-tests/upgrade-probe-00001","knative.dev/pod":"upgrade-probe-00001-deployment-79886db75f-lx8xf"} 

      Note: IP addresses 10.128.4.13 and 10.130.2.14 are activator pods

       

      The activator pod shows this error at that time  (this is the activator pod before upgrade):

      Endpoints: ready backends: 1, not-ready backends: 0","knative.dev/controller":"activator","knative.dev/pod":"activator-656dfc5575-l75xg","knative.dev/key":"serving-tests/upgrade-probe-00001"}
      {"severity":"DEBUG","timestamp":"2022-11-23T07:43:36.9852563Z","logger":"activator","caller":"net/revision_backends.go:346","message":"Done probing, got 1 healthy pods","knative.dev/controller":"activator","knative.dev/pod":"activator-656dfc5575-l75xg","knative.dev/key":"serving-tests/upgrade-probe-00001"}
      {"httpRequest": {"requestMethod": "GET", "requestUrl": "/", "requestSize": "0", "status": 502, "responseSize": "53", "userAgent": "Go-http-client/1.1", "remoteIp": "10.130.2.10:57710", "serverIp": "", "referer": "", "latency": "0.008467429s", "protocol": "HTTP/1.1"}, "traceId": "[37b41de5841892739b6dfba61b8e017b]"} 

              rhn-support-knakayam Kenjiro Nakayama (Inactive)
              mgencur@redhat.com Martin Gencur
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: