The Sonobuoy server uses by default the "SecurityContextMode" with value "nonroot". This mode will set a couple of Kubernetes securityContext statements to the podSpec.
In k8s 1.24, OCP 4.11, those statements are reporting errors on the Aggregator logs, preventing the aggregator to patch the pods.
$ KUBECONFIG=$PWD/.opct-410t411/clusters/opct-410t411/auth/kubeconfig oc logs -n openshift-provider-certification sonobuoy |grep error= |head -n1 time="2023-01-18T14:27:32Z" level=info msg="couldn't annotate sonobuoy pod" error="couldn't patch pod annotation: pods \"sonobuoy\" is forbidden: unable to validate against any security context constraint: [provider restricted-v2: .spec.securityContext.fsGroup: Invalid value: []int64{2000}: 2000 is not an allowed group, spec.containers[0].securityContext.runAsUser: Invalid value: 1000: must be in the ranges: [1000650000, 1000659999], provider restricted: .spec.securityContext.fsGroup: Invalid value: []int64{2000}: 2000 is not an allowed group, provider machine-api-termination-handler: .spec.securityContext.fsGroup: Invalid value: []int64{2000}: 2000 is not an allowed group, spec.volumes[0]: Invalid value: \"configMap\": configMap volumes are not allowed to be used, spec.volumes[1]: Invalid value: \"configMap\": configMap volumes are not allowed to be used, spec.volumes[2]: Invalid value: \"emptyDir\": emptyDir volumes are not allowed to be used, spec.volumes[3]: Invalid value: \"projected\": projected volumes are not allowed to be used, provider hostnetwork-v2: .spec.securityContext.fsGroup: Invalid value: []int64{2000}: 2000 is not an allowed group, provider hostnetwork: .spec.securityContext.fsGroup: Invalid value: []int64{2000}: 2000 is not an allowed group, provider hostaccess: .spec.securityContext.fsGroup: Invalid value: []int64{2000}: 2000 is not an allowed group]" $ KUBECONFIG=$PWD/.opct-410t411/clusters/opct-410t411/auth/kubeconfig oc logs -n openshift-provider-certification sonobuoy |grep error= |wc -l 405
We got it while implementing the upgrade feature, that the aggregator stop working.
My theory is that after upgrading the cluster to 4.11 (4.10->4.11), the plugin jobs, previously created with valid securityContext, is not yet valid, and should be updated, otherwise the KAS(pod security apis) are refusing this statements.
When running the OPCT with SecurityContextMode=none[1], the Sonobuoy will not set any value for SecurityContext on pods, and will work normally.
aggConfig.SecurityContextMode = "none"
The solution for OPCT is defined on the PR: https://github.com/redhat-openshift-ecosystem/provider-certification-tool/pull/39
The long-term solution on the upstream/sonobuoy should be evaluated on 1.24+ - there is a opened issue to track it: https://github.com/vmware-tanzu/sonobuoy/issues/1858
ENGINEERING REFERENCES
- is documented by
-
OPCT-7 [bug][backend] Sonobuoy's aggregator stop working after cluster upgrades
- Closed