-
Bug
-
Resolution: Done
-
Blocker
-
None
-
None
-
False
-
-
False
-
OCPSTRAT-343 - Onboarding New Providers/Platforms (Phase 2)
-
-
DESCRIPTION:
The sonobuoy aggregator pod is crashing during cluster upgrades (feature SPLAT-651 ).
The aggregator pod is receiving requests from workers to update the status, but it requires to annotate the pod to perform the action, it's being refused after some time during the cluster upgrade (error message below). It seems the token used to access the kube-api is being expired during upgrade progress.
Note: the certification pods (sonobuoy) is removed from upgrade lifecycle by paused MCP.
Steps to reproduce:
- Apply the fixes on SCC which blocks the upgrade (
SPLAT-874) - Run the OPCT
- Start the upgrade process on y-stream (updates on z-stream does not crash the sonobuoy token)
- Checke the sonobuoy aggregator logs, the error below should be on fire:
time="2022-11-16T19:51:32Z" level=info msg="couldn't annotate sonobuoy pod" error="couldn't patch pod annotation: pods \"sonobuoy\" is forbidden: unable to validate against any security context constraint: [ provider restricted-v2: .spec.securityContext.fsGroup: Invalid value: []int64{2000}: 2000 is not an allowed group, spec.containers[0].securityContext.runAsUser: Invalid value: 1000: must be in the ranges: [1000650000, 1000659999] provider restricted: .spec.securityContext.fsGroup: Invalid value: []int64{2000}: 2000 is not an allowed group, provider machine-api-termination-handler: .spec.securityContext.fsGroup: Invalid value: []int64{2000}: 2000 is not an allowed group, spec.volumes[0]: Invalid value: \"configMap\": configMap volumes are not allowed to be used, spec.volumes[1]: Invalid value: \"configMap\": configMap volumes are not allowed to be used, spec.volumes[2]: Invalid value: \"emptyDir\": emptyDir volumes are not allowed to be used, spec.volumes[3]: Invalid value: \"projected\": projected volumes are not allowed to be used, provider hostnetwork-v2: .spec.securityContext.fsGroup: Invalid value: []int64{2000}: 2000 is not an allowed group, provider hostnetwork: .spec.securityContext.fsGroup: Invalid value: []int64{2000}: 2000 is not an allowed group, provider hostaccess: .spec.securityContext.fsGroup: Invalid value: []int64{2000}: 2000 is not an allowed group ]"
Then the CLI got stuck and did not update the plugin states:
[...]
Wed, 16 Nov 2022 16:51:24 -03> Global Status: running
JOB_NAME | STATUS | RESULTS | PROGRESS | MESSAGE
05-openshift-cluster-upgrade | running | | 0/0 (0 failures) | status=Working towards 4.11.4: 106 of 803 done (13% complete)
10-openshift-kube-conformance | running | | 0/345 (0 failures) | status=waiting-for=05-openshift-cluster-upgrade=(0/0/0)=[66/100]
20-openshift-conformance-validated | running | | 0/3251 (0 failures) | status=blocked-by=10-openshift-kube-conformance=(0/-345/0)=[0/100]
99-openshift-artifacts-collector | running | | 0/0 (0 failures) | status=blocked-by=20-openshift-conformance-validated=(0/-3251/0)=[0/100]
[...]
Required:
- PR created fixing the errors on OPCT
{}Nice to have:{}
...
ACCEPTANCE CRITERIA:
- The results running the upgrade feature should be accepted
- Any PR should be merged
- Any external issues should be addressed
ENGINEERING DETAILS:
- documents
-
OPCT-24 [backend][sonobuoy] Aggregator refuse progress updater with big payloads
- To Do
-
OPCT-31 [backend][sonobuoy] Server is not setting securityContext correctly by default
- To Do
- is blocked by
-
OPCT-6 [bug] The RBAC used on Sonobuoy SA stuck the cluster upgrades on Y-stream
- Closed
- links to