-
Bug
-
Resolution: Done
-
Major
-
1.16.0
-
None
-
False
-
False
-
Undefined
-
The test fails as follows:
autoscaler.go:87: Error: request success rate under SLO: total = 29212, errors = 53, rate = 0.998186, SLO = 0.999000
And it throws errors like this one during the run:
2021-06-30T05:18:23.015-0400 INFO e2e/autoscale.go:164 Status = 502, want: 200
2021-06-30T05:18:23.015-0400 INFO e2e/autoscale.go:165 URL: http://serverless-upgrade-c-olrfphrl-serving-tests.apps.ocf-rollup-1-16-11-rolling-upgrade-1.16.openshift-aws.rhocf-dev.net?sleep=500 Start: 2021-06-30T05:18:22-04:00 End: 2021-06-30T05:18:23-04:00 Duration: 79.75504ms Error: 502 Bad Gateway Body:
dial tcp 10.128.2.63:8012: connect: connection refused
The full test log can be found in this job: https://master-jenkins-csb-serverless-qe.apps.ocp4.prod.psi.redhat.com/job/functional_tests/job/stream1_16/job/rolling-upgrade-1.16/11/console
When upgrading from Serving 0.21. to 0.22 the queu-proxy container fails with
25m Warning Unhealthy pod/serverless-upgrade-c-olrfphrl-00001-deployment-c54b46dcd-g4fqx Startup probe failed: flag provided but not defined: -probe-period
Usage of /ko-app/queue:
-probe-timeout duration
run startup probe with given timeout (default -1ns)
The containers are restarted but some of the container endpoints remain there for a longer time among the notReady containers and requests to those containers produce the errors above:
{"severity":"DEBUG","timestamp":"2021-06-30T09:17:32.390327728Z","logger":"activator","caller":"net/revision_backends.go:346","message":"Revision state","knative.dev/controller":"activator","knative.dev/pod":"activator-6b498d855d-jk7lk","knative.dev/key":"serving-tests/serverless-upgrade-c-olrfphrl-00001","dests":{"ready":"10.131.0.67:8012,10.131.0.68:8012,10.131.0.69:8012,10.129.2.46:8012,10.130.2.52:8012,10.130.2.53:8012,10.130.2.54:8012,10.128.2.52:8012,10.128.2.53:8012,10.129.2.49:8012","notReady":"10.129.2.78:8012"}
In the end these endpoints are removed from the notReady list but the test fails because it has some threshold for failures/successes which is 0.999
The activator pods are logging this warning:
{"severity":"WARNING","timestamp":"2021-06-30T09:17:32.390292207Z","logger":"activator","caller":"net/revision_backends.go:286","message":"Failed probing pods","knative.dev/controller":"activator","knative.dev/pod":"activator-6b498d855d-jk7lk","knative.dev/key":"serving-tests/serverless-upgrade-c-olrfphrl-00001","curDests":{"ready":"10.130.2.54:8012,10.128.2.52:8012,10.128.2.53:8012,10.129.2.49:8012,10.131.0.67:8012,10.131.0.68:8012,10.131.0.69:8012,10.129.2.46:8012,10.130.2.52:8012,10.130.2.53:8012","notReady":"10.129.2.78:8012"},"error":"unexpected body: want \"queue\", got \"\""}