-
Bug
-
Resolution: Done
-
Undefined
-
1.26.0
-
None
-
None
-
False
-
-
False
-
-
The ProbeTest which sends requests periodically to a Knative service during upgrade quite often fails with the following error:
⏳ 2022-11-22T13:14:30.000Z INFO test/prober.go:172 "https://upgrade-probe-serving-tests.apps.ci-ocp-4-11-amd64-aws-us-east-1-ndpsd.hive.aws.ci.openshift.org" status = 502, want: 200
⏳ 2022-11-22T13:14:30.001Z INFO test/prober.go:173 response: status: 502, body: dial tcp 127.0.0.1:8080: connect: connection refused
⏳ , headers: map[Content-Length:[53] Content-Type:[text/plain; charset=utf-8] Date:[Tue, 22 Nov 2022 13:14:29 GMT] Server:[envoy] X-Content-Type-Options:[nosniff] X-Envoy-Upstream-Service-Time:[8] Zipkin_trace_id:[66c4491342ab819f94645839a2e837b4]]
Example job run is [ here|https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift-knative_serverless-operator/1822/pull-ci-openshift-knative-serverless-operator-main-4.11-upgrade-tests-aws-ocp-411/1595025697598869504 ]
It turns out this error happens during the upgrade-probe pod shutdown.
When debugging with cluster logging enabled the the queue-proxy container in the Knative service pod has these logs:
{"severity":"INFO","timestamp":"2022-11-23T07:43:36.944363685Z","logger":"queueproxy","caller":"queue/main.go:233","message":"Received TERM signal, attempting to gracefully shutdown servers.","knative.dev/key":"serving-tests/upgrade-probe-00001","knative.dev/pod":"upgrade-probe-00001-deployment-79886db75f-lx8xf"} {"severity":"INFO","timestamp":"2022-11-23T07:43:36.944435244Z","logger":"queueproxy","caller":"queue/main.go:234","message":"Sleeping 30s to allow K8s propagation of non-ready state","knative.dev/key":"serving-tests/upgrade-probe-00001","knative.dev/pod":"upgrade-probe-00001-deployment-79886db75f-lx8xf"} {"httpRequest": {"requestMethod": "GET", "requestUrl": "/", "requestSize": "0", "status": 200, "responseSize": "17", "userAgent": "Go-http-client/1.1", "remoteIp": "10.128.4.13:59976", "serverIp": "10.128.4.54", "referer": "", "latency": "0.000527796s", "protocol": "HTTP/1.1"}, "traceId": "[aa2c9d3da972d1d71f0eea1fa59677d8]"} {"httpRequest": {"requestMethod": "GET", "requestUrl": "/", "requestSize": "0", "status": 502, "responseSize": "53", "userAgent": "Go-http-client/1.1", "remoteIp": "10.130.2.14:41902", "serverIp": "10.128.4.54", "referer": "", "latency": "0.000520042s", "protocol": "HTTP/1.1"}, "traceId": "[37b41de5841892739b6dfba61b8e017b]"} {"severity":"ERROR","timestamp":"2022-11-23T07:43:36.98513673Z","logger":"queueproxy","caller":"network/error_handler.go:33","message":"error reverse proxying request; sockstat: sockets: used 9\nTCP: inuse 1 orphan 0 tw 16 alloc 751 mem 1763\nUDP: inuse 0 mem 6\nUDPLITE: inuse 0\nRAW: inuse 0\nFRAG: inuse 0 memory 0\n","knative.dev/key":"serving-tests/upgrade-probe-00001","knative.dev/pod":"upgrade-probe-00001-deployment-79886db75f-lx8xf","error":"dial tcp 127.0.0.1:8080: connect: connection refused","stacktrace":"knative.dev/pkg/network.ErrorHandler.func1\n\t/go/src/knative.dev/serving/vendor/knative.dev/pkg/network/error_handler.go:33\nnet/http/httputil.(*ReverseProxy).ServeHTTP\n\t/usr/local/go/src/net/http/httputil/reverseproxy.go:300\nknative.dev/serving/pkg/queue.ProxyHandler.func1\n\t/go/src/knative.dev/serving/pkg/queue/handler.go:75\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2047\nknative.dev/serving/pkg/queue.ForwardedShimHandler.func1\n\t/go/src/knative.dev/serving/pkg/queue/forwarded_shim.go:54\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2047\nknative.dev/serving/pkg/http/handler.(*timeoutHandler).ServeHTTP.func4\n\t/go/src/knative.dev/serving/pkg/http/handler/timeout.go:117"} {"httpRequest": {"requestMethod": "GET", "requestUrl": "/", "requestSize": "0", "status": 503, "responseSize": "14", "userAgent": "kube-probe/1.24", "remoteIp": "10.128.4.1:42542", "serverIp": "10.128.4.54", "referer": "", "latency": "1.7999e-05s", "protocol": "HTTP/1.1"}, "traceId": "[]"} {"httpRequest": {"requestMethod": "GET", "requestUrl": "/", "requestSize": "0", "status": 503, "responseSize": "14", "userAgent": "kube-probe/1.24", "remoteIp": "10.128.4.1:34060", "serverIp": "10.128.4.54", "referer": "", "latency": "1.0637e-05s", "protocol": "HTTP/1.1"}, "traceId": "[]"} {"httpRequest": {"requestMethod": "GET", "requestUrl": "/", "requestSize": "0", "status": 503, "responseSize": "14", "userAgent": "kube-probe/1.24", "remoteIp": "10.128.4.1:34320", "serverIp": "10.128.4.54", "referer": "", "latency": "9.076e-06s", "protocol": "HTTP/1.1"}, "traceId": "[]"} {"severity":"INFO","timestamp":"2022-11-23T07:44:06.945292872Z","logger":"queueproxy","caller":"queue/main.go:241","message":"Shutting down server: metrics","knative.dev/key":"serving-tests/upgrade-probe-00001","knative.dev/pod":"upgrade-probe-00001-deployment-79886db75f-lx8xf"} {"severity":"INFO","timestamp":"2022-11-23T07:44:06.94656192Z","logger":"queueproxy","caller":"queue/main.go:246","message":"Shutdown complete, exiting...","knative.dev/key":"serving-tests/upgrade-probe-00001","knative.dev/pod":"upgrade-probe-00001-deployment-79886db75f-lx8xf"}
Note: IP addresses 10.128.4.13 and 10.130.2.14 are activator pods
The activator pod shows this error at that time (this is the activator pod before upgrade):
Endpoints: ready backends: 1, not-ready backends: 0","knative.dev/controller":"activator","knative.dev/pod":"activator-656dfc5575-l75xg","knative.dev/key":"serving-tests/upgrade-probe-00001"} {"severity":"DEBUG","timestamp":"2022-11-23T07:43:36.9852563Z","logger":"activator","caller":"net/revision_backends.go:346","message":"Done probing, got 1 healthy pods","knative.dev/controller":"activator","knative.dev/pod":"activator-656dfc5575-l75xg","knative.dev/key":"serving-tests/upgrade-probe-00001"} {"httpRequest": {"requestMethod": "GET", "requestUrl": "/", "requestSize": "0", "status": 502, "responseSize": "53", "userAgent": "Go-http-client/1.1", "remoteIp": "10.130.2.10:57710", "serverIp": "", "referer": "", "latency": "0.008467429s", "protocol": "HTTP/1.1"}, "traceId": "[37b41de5841892739b6dfba61b8e017b]"}