The artifacts collector is starting prematurely[1] due to the long time running the OpenShift conformance (20-openshift-conformance-validated)[2], consequently, the must-gather is not getting the data from the entire certification execution.
This bug is caused by the 'blocker engine' to monitor and allow the plugin to run, there is a timeout and the last plugin, artifacts-collector, is falling the timeout due a long time running the blocked plugin.
There are two possible solution in the short[A] and long-term[B]:
- A: increase the timeout as it's expected to OCP Validated take long time
- B: Validate and migrate to the native sonobuoy priority feature, which was requested by us to the upstream and implemented on the newer releases, but the OPCT didn't migrate yet
[1] 99-openshift-artifacts-collector plugin executor started at 20221222-223727
#./wait-plugin.sh:9> 20221222-223657> [waiter] Waiting 30s for Plugin[20-openshift-conformance-validated]...[99/100] #./wait-plugin.sh:9> 20221222-223727> [waiter] Plugin[20-openshift-conformance-validated] with status[running]... #./wait-plugin.sh:9> 20221222-223727> [waiter] {"plugins":[{"plugin":"10-openshift-kube-conformance","node":"global","status":"complete","result-status":"","result-counts":null,"progress":{"name":"10-openshift-kube-conformance","node":"global","timestamp":"2022-12-22T21:46:18.317405816Z","msg":"status=report-progress-finished","total":359,"completed":0,"failures":["[sig-api-machinery] CustomResourceDefinition Watch [Privileged:ClusterAdmin] CustomResourceDefinition Watch watch on custom resource definition objects [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]"]}},{"plugin":"20-openshift-conformance-validated","node":"global","status":"running","result-status":"","result-counts":null,"progress":{"name":"20-openshift-conformance-validated","node":"global","timestamp":"2022-12-22T22:13:00.550685989Z","msg":"status=running","total":3454,"completed":0,"failures":["[sig-instrumentation] Prometheus [apigroup:image.openshift.io] when installed on the cluster shouldn't report any alerts in firing state apart from Watchdog and AlertmanagerReceiversNotConfigured [Early][apigroup:config.openshift.io] [Skipped:Disconnected] [Suite:openshift/conformance/parallel]","[sig-api-machinery] CustomResourceDefinition Watch [Privileged:ClusterAdmin] CustomResourceDefinition Watch watch on custom resource definition objects [Conformance] [Suite:openshift/conformance/parallel/minimal] [Suite:k8s]","[sig-node] Pods Extended Pod Container lifecycle evicted pods should be terminal [Suite:openshift/conformance/parallel] [Suite:k8s]","[sig-scheduling] SchedulerPredicates [Serial] validates pod overhead is considered along with resource limits of pods that are allowed to run verify pod overhead is accounted for [Suite:openshift/conformance/serial] [Suite:k8s]"]}},{"plugin":"99-openshift-artifacts-collector","node":"global","status":"running","result-status":"","result-counts":null,"progress":{"name":"99-openshift-artifacts-collector","node":"global","timestamp":"2022-12-22T22:04:22.576573544Z","msg":"status=waiting-for=20-openshift-conformance-validated=(0/-3454/0)=[99/100]","total":0,"completed":0}}],"status":"running","tar-info":{"name":"","created":"0001-01-01T00:00:00Z","sha256":"","size":0}} #./wait-plugin.sh:9> 20221222-223727> [waiter] Timeout waiting condition 'complete' for plugin[20-openshift-conformance-validated]. #./global_fn.sh:12> [signal handler] ERROR on line 87 ./runner.sh #./runner.sh:17> 20221222-223727> [runner] starting executor... #./executor.sh:14> [executor] Starting... #./executor.sh:16> [executor] Checking if credentials are present... #./executor.sh:23> [executor] Executor started. Choosing execution type based on environment sets. /tmp/sonobuoy/results /plugin [must-gather ] OUT Using must-gather plug-in image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:58c1f71f2004767acbabfdf6ab3fc5689a63c713de564c01197fbc3795610ef6 When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information: ClusterID: b31ef2da-7382-4e1c-91eb-65501c24a54c ClusterVersion: Stable at "4.12.0-rc.4" ClusterOperators: All healthy and stable [must-gather ] OUT namespace/openshift-must-gather-c4z68 created [must-gather ] OUT clusterrolebinding.rbac.authorization.k8s.io/must-gather-n9mf4 created (...)
[2] 20-openshift-conformance-validated plugin executor finished at 20221222-231248
(...) Suite run returned error: 19 fail, 1305 pass, 2114 skip (1h25m15s) error: 19 fail, 1305 pass, 2114 skip (1h25m15s) + os_log_info 'openshift-tests finished[0]' ++ caller ++ awk '{print$2":"$1}' + echo '#./executor.sh:62> ' #./executor.sh:62> openshift-tests finished[0] 'openshift-tests finished[0]' + set +x #./executor.sh:117> Plugin executor finished. Result[0] #./runner.sh:17> 20221222-231243> [runner] Plugin finished. Result[0] #./runner.sh:17> 20221222-231243> [runner] Saving results triggered. Slowing down... /tmp/sonobuoy/results /plugin #./runner.sh:17> 20221222-231248> [runner] Looking for junit result files... #./runner.sh:17> 20221222-231248> [runner] Adjusting permissions for results files. #./runner.sh:17> 20221222-231248> [runner] Sending plugin done to unlock report-progress #./runner.sh:17> 20221222-231248> [runner] Sending sonobuoy worker the result file path /plugin #./runner.sh:17> 20221222-231248> [runner] Results saved at /tmp/sonobuoy/results/done=[/tmp/sonobuoy/results/junit_e2e__20221222-214632.xml]
[3] report-progress is not finishing correctly after the timeout. (maybe addressed to another issue, need to check the strategy of keeping the blocker engine or not)
$ oc get pods -n openshift-provider-certification NAME READY STATUS RESTARTS AGE sonobuoy 1/1 Running 0 81m sonobuoy-10-openshift-kube-conformance-job-47af576993104fc8 0/3 Completed 0 81m sonobuoy-20-openshift-conformance-validated-job-0b739af4dbf24fb1 3/3 Running 0 81m sonobuoy-99-openshift-artifacts-collector-job-22020c2d3faf485b 1/3 NotReady 0 81m Containers: report-progress: Container ID: cri-o://fc7127ff9032da0477cc5c0a2e243ba6a5e438c93955e079c5d7762c38ca6bc9 Image: quay.io/ocp-cert/openshift-tests-provider-cert:dev20221221190826 Image ID: quay.io/ocp-cert/openshift-tests-provider-cert@sha256:67133bcbd49285ebcf72bd2b16746abe663599b13a36a90fdb3b5bb69e0fc791 Port: <none> Host Port: <none> Command: ./report-progress.sh State: Running Started: Fri, 23 Dec 2022 11:34:59 -0300 Ready: True Restart Count: 0 Environment: ENV_NODE_NAME: (v1:spec.nodeName) ENV_POD_NAME: sonobuoy-99-openshift-artifacts-collector-job-22020c2d3faf485b (v1:metadata.name) ENV_POD_NAMESPACE: openshift-provider-certification (v1:metadata.namespace) PLUGIN_ID: 99 RESULTS_DIR: /tmp/sonobuoy/results SONOBUOY: true SONOBUOY_CONFIG_DIR: /tmp/sonobuoy/config SONOBUOY_PROGRESS_PORT: 8099 SONOBUOY_RESULTS_DIR: /tmp/sonobuoy/results Mounts: /tmp/shared from shared (rw) /tmp/sonobuoy/results from results (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-8hfgl (ro) $ oc logs sonobuoy-99-openshift-artifacts-collector-job-22020c2d3faf485b -n openshift-provider-certification -c report-progress --tail=10 "failures":[], "msg":"status=waiting-for=20-openshift-conformance-validated=(0/-3454/0)=[98/100]" } 20221223-151912> [report] Sending report payload [dep-checker]: { "completed":0, "total":0, "failures":[], "msg":"status=waiting-for=20-openshift-conformance-validated=(0/-3454/0)=[99/100]" } Timeout waiting condition 'complete' for plugin[20-openshift-conformance-validated].
- clones
-
OPCT-15 [bug][plugins][artifacts-collector] Must-gather collector is starting prematurely
- Closed
- links to