-
Bug
-
Resolution: Done
-
Major
-
None
-
None
-
True
-
-
False
-
OCPSTRAT-343 - Onboarding New Providers/Platforms (Phase 2)
-
-
It will be nice to always run the artifacts-collector plugin, even if previous plugin have been failed.
- Execution
Wed, 11 Jan 2023 17:24:28 -03> Global Status: running JOB_NAME | STATUS | RESULTS | PROGRESS | MESSAGE 05-openshift-cluster-upgrade | complete | | 0/0 (0 failures) | waiting for post-processor... 10-openshift-kube-conformance | complete | | 345/345 (2 failures) | waiting for post-processor... 20-openshift-conformance-validated | running | | 2860/3240 (31 failures) | status=running 99-openshift-artifacts-collector | running | | 0/0 (0 failures) | status=waiting-for=20-openshift-conformance-validated=(0/-380/0)=[33/1080] Wed, 11 Jan 2023 17:24:38 -03> Global Status: running JOB_NAME | STATUS | RESULTS | PROGRESS | MESSAGE 05-openshift-cluster-upgrade | complete | | 0/0 (0 failures) | waiting for post-processor... 10-openshift-kube-conformance | complete | | 345/345 (2 failures) | waiting for post-processor... 20-openshift-conformance-validated | failed | | 2860/3240 (31 failures) | waiting for post-processor... 99-openshift-artifacts-collector | failed | | 0/0 (0 failures) | waiting for post-processor... Wed, 11 Jan 2023 17:24:48 -03> Global Status: running JOB_NAME | STATUS | RESULTS | PROGRESS | MESSAGE 05-openshift-cluster-upgrade | complete | | 0/0 (0 failures) | waiting for post-processor... 10-openshift-kube-conformance | complete | | 345/345 (2 failures) | waiting for post-processor... 20-openshift-conformance-validated | failed | | 2860/3240 (31 failures) | waiting for post-processor... 99-openshift-artifacts-collector | failed | | 0/0 (0 failures) | waiting for post-processor...
- 'plugin' container logs plugin 99-openshift-artifacts-collector: the execution ran correctly (data collected)
[must-gather ] OUT namespace/openshift-must-gather-rmsrk deleted Reprinting Cluster State: When opening a support case, bugzilla, or issue please include the following summary data along with any other requested information: ClusterID: 4e878853-63f5-43da-be49-ecb387daebdd ClusterVersion: Stable at "4.11.19" ClusterOperators: All healthy and stable /plugin #./executor.sh:154> Plugin executor finished. Result[0] #./runner.sh:17> 20230111-202652> [runner] Plugin finished. Result[0] #./runner.sh:17> 20230111-202652> [runner] Saving results triggered. Slowing down... /tmp/sonobuoy/results /plugin #./runner.sh:17> 20230111-202657> [runner] Results saved at /tmp/sonobuoy/results/done=[/tmp/sonobuoy/results/raw-results.tar.gz]
- But the 'sonobuoy-worker' container, responsible to stream the results to the server, was reporting errors: It seems the server was busy to receive/save the results and lose the request (REST API issues), then the pod recycle was called losing the results
Error: gathering results: error encountered dialing aggregator at https://[10.128.2.12]:8080/api/v1/results/global/99-openshift-artifacts-collector: Put "https://[10.128.2.12]:8080/api/v1/results/global/99-openshift-artifacts-collector": dial tcp 10.128.2.12:8080: connect: connection refused Usage: sonobuoy worker global [flags]Flags: -h, --help help for globalGlobal Flags: --level level Log level. One of {panic, fatal, error, warn, info, debug, trace} (default info)
- Logs from the server (sonobuoy aggregator)
time="2023-01-11T20:24:25Z" level=info msg="received request" client_cert="[99-openshift-artifacts-collector]" method=POST plugin_name=99-openshift-artifacts-collector url=/api/v1/progress/global/99-openshift-artifacts-collector time="2023-01-11T20:24:35Z" level=error msg="Timeout waiting for plugin 99-openshift-artifacts-collector. Try checking the pod logs and other data in the results tarball for more information." time="2023-01-11T20:24:35Z" level=info msg="received internal aggregator result" node=global plugin_name=99-openshift-artifacts-collector
- Archive results
$ jq . .opct-41045pr33/clusters/opct-41045pr33/opct/results/plugins/99-openshift-artifacts-collector/errors/global/error.json { "error": "Plugin timeout while waiting for results so there are no results. Check pod logs or other cluster details for more information as to why this occurred." }