Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 1.16.0
Affects Version/s: 1.16.0
Component/s: None
Labels:
- qe-verified

Blocked:
False
Ready:
False
Release Note Text:
Undefined
Market:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

The test fails as follows:

autoscaler.go:87: Error:  request success rate under SLO: total = 29212, errors = 53, rate = 0.998186, SLO = 0.999000

And it throws errors like this one during the run:

2021-06-30T05:18:23.015-0400	INFO	e2e/autoscale.go:164	Status = 502, want: 200
2021-06-30T05:18:23.015-0400	INFO	e2e/autoscale.go:165	URL: http://serverless-upgrade-c-olrfphrl-serving-tests.apps.ocf-rollup-1-16-11-rolling-upgrade-1.16.openshift-aws.rhocf-dev.net?sleep=500 Start: 2021-06-30T05:18:22-04:00 End: 2021-06-30T05:18:23-04:00 Duration: 79.75504ms Error: 502 Bad Gateway Body:
dial tcp 10.128.2.63:8012: connect: connection refused

The full test log can be found in this job: https://master-jenkins-csb-serverless-qe.apps.ocp4.prod.psi.redhat.com/job/functional_tests/job/stream1_16/job/rolling-upgrade-1.16/11/console

When upgrading from Serving 0.21. to 0.22 the queu-proxy container fails with

25m         Warning   Unhealthy                      pod/serverless-upgrade-c-olrfphrl-00001-deployment-c54b46dcd-g4fqx     Startup probe failed: flag provided but not defined: -probe-period
Usage of /ko-app/queue:
  -probe-timeout duration
            run startup probe with given timeout (default -1ns)

The containers are restarted but some of the container endpoints remain there for a longer time among the notReady containers and requests to those containers produce the errors above:

{"severity":"DEBUG","timestamp":"2021-06-30T09:17:32.390327728Z","logger":"activator","caller":"net/revision_backends.go:346","message":"Revision state","knative.dev/controller":"activator","knative.dev/pod":"activator-6b498d855d-jk7lk","knative.dev/key":"serving-tests/serverless-upgrade-c-olrfphrl-00001","dests":{"ready":"10.131.0.67:8012,10.131.0.68:8012,10.131.0.69:8012,10.129.2.46:8012,10.130.2.52:8012,10.130.2.53:8012,10.130.2.54:8012,10.128.2.52:8012,10.128.2.53:8012,10.129.2.49:8012","notReady":"10.129.2.78:8012"}

In the end these endpoints are removed from the notReady list but the test fails because it has some threshold for failures/successes which is 0.999
The activator pods are logging this warning:

{"severity":"WARNING","timestamp":"2021-06-30T09:17:32.390292207Z","logger":"activator","caller":"net/revision_backends.go:286","message":"Failed probing pods","knative.dev/controller":"activator","knative.dev/pod":"activator-6b498d855d-jk7lk","knative.dev/key":"serving-tests/serverless-upgrade-c-olrfphrl-00001","curDests":{"ready":"10.130.2.54:8012,10.128.2.52:8012,10.128.2.53:8012,10.129.2.49:8012,10.131.0.67:8012,10.131.0.68:8012,10.131.0.69:8012,10.129.2.46:8012,10.130.2.52:8012,10.130.2.53:8012","notReady":"10.129.2.78:8012"},"error":"unexpected body: want \"queue\", got \"\""}

links to

openshift/knative-serving#835: [SRVKS-776] Allow for both probe-period and probe-timeout

Assignee:: Markus Thömmes (Inactive)

Reporter:: Martin Gencur (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2021/06/30 11:18 AM

Updated:: 2022/02/11 5:25 PM

Resolved:: 2021/07/01 7:24 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide