-
Sub-task
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
False
-
-
False
We need to provide a way to run the plugins ordered, basically we don't want to run the Certification Plugins concurrently to avoid disruption on each other.
Basically we need to run the one by one:
kube-conformance -> cert-level-1 -> cert-level-2 -> cert-level-3
We've implemented the work around on MVP using the `msg` field on report-progress, ideally it is not good as each plugin will control your own 'blocker plugin' and decide if it will start the main process (schedule e2e tests using openshift-utility) when the blocker has been finished, or been stopped progress looking the job count:
Wed Apr 13 19:06:09 -03 2022> Global Status: running JOB_NAME | STATUS | RESULTS | PROGRESS | MESSAGE openshift-kube-conformance | complete | | 345/345 (0 failures) | waiting post-processor... openshift-provider-cert-level1 | running | | 80/81 (0 failures) | status=running openshift-provider-cert-level2 | running | | 0/17 (0 failures) | status=waiting-for=openshift-provider-cert-level1=(0/-1/0)=[0/100] openshift-provider-cert-level3 | running | | 0/0 (0 failures) | status=blocked-by=openshift-provider-cert-level2=(0/-17/0)=[0/100]
More details on the demo: https://asciinema.org/a/487487
The problem is that sometime one plugin can crash for some reason (here one example of zombie execution) and we need to keep handling those exceptions inside the plugin logic, instead of sonobuoy service. Example when level2 pod has been removed and level3 loses its state:
Tue Apr 26 18:21:12 -03 2022> Global Status: running JOB_NAME | STATUS | RESULTS | PROGRESS | MESSAGE openshift-kube-conformance | complete | | 345/345 (0 failures) | waiting post-processor... openshift-provider-cert-level1 | complete | | 81/81 (0 failures) | waiting post-processor... openshift-provider-cert-level2 | running | | 0/17 (0 failures) | status=blocked-by=openshift-provider-cert-level1=(0/-81/0)=[0/100] openshift-provider-cert-level3 | running | | 0/0 (0 failures) | status=blocked-by=openshift-provider-cert-level2=(0/-17/0)=[0/100]
We expect that the sonobuoy server control the order and it's states, coordinating what plugin will run.
There's a issue on upstream describing the feature we need:
https://github.com/vmware-tanzu/sonobuoy/issues/631