-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.14.0
-
Moderate
-
No
-
MCO Sprint 249
-
1
-
False
-
-
N/A
-
Release Note Not Required
-
In Progress
Description of problem:
In a cluster with a pool using OCB functionality, if we update the imageBuilderType value while an openshift-image-builder pod is building an image, the build fails. It can fail in 2 ways: 1. Removing the running pod that is building the image, and what we get is a failed build reporting "Error (BuildPodDeleted)" 2. The machine-os-builder pod is restarted but the build pod is not removed. Then the build is never removed.
Version-Release number of selected component (if applicable):
$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.14.0-0.nightly-2023-09-12-195514 True False 154m Cluster version is 4.14.0-0.nightly-2023-09-12-195514
How reproducible:
Steps to Reproduce:
1. Create the needed resources to make OCB functionality work (on-cluster-build-config configmap, the secrets and the imageSpec) We reproduced it using imageBuilderType="" oc patch cm/on-cluster-build-config -n openshift-machine-config-operator -p '{"data":{"imageBuilderType": ""}}' 2. Create an infra pool and label it so that it can use OCB functionality apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: infra spec: machineConfigSelector: matchExpressions: - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]} nodeSelector: matchLabels: node-role.kubernetes.io/infra: "" oc label mcp/infra machineconfiguration.openshift.io/layering-enabled= 3. Wait until the triggered build has finished. 4. Create a new MC to trigger a new build. This one, for example: kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: test-machine-config spec: config: ignition: version: 3.1.0 storage: files: - contents: source: data:text/plain;charset=utf-8;base64,dGVzdA== filesystem: root mode: 420 path: /etc/test-file.test 5. Just after a new build pod is created, configure the on-cluster-build-config configmap to use the "custom-pod-builder" imageBuilderType oc patch cm/on-cluster-build-config -n openshift-machine-config-operator -p '{"data":{"imageBuilderType": "custom-pod-builder"}}'
Actual results:
We have observed 2 behaviors after step 5: 1. The machine-os-builder pod is restarted and the build is never removed. build.build.openshift.io/build-rendered-infra-b2473d404d9ddfa1536d2fb32b54d855 Docker Dockerfile Running 10 seconds ago NAME READY STATUS RESTARTS AGE pod/build-rendered-infra-b2473d404d9ddfa1536d2fb32b54d855-build 1/1 Running 0 12s pod/machine-config-controller-5bdd7b66c5-dl4hh 2/2 Running 0 90m pod/machine-config-daemon-5wbw4 2/2 Running 0 90m pod/machine-config-daemon-fqr8x 2/2 Running 0 90m pod/machine-config-daemon-g77zd 2/2 Running 0 83m pod/machine-config-daemon-qzmvv 2/2 Running 0 83m pod/machine-config-daemon-w8mnz 2/2 Running 0 90m pod/machine-config-operator-7dd564556d-mqc5w 2/2 Running 0 92m pod/machine-config-server-28lnp 1/1 Running 0 89m pod/machine-config-server-5csjz 1/1 Running 0 89m pod/machine-config-server-fv4vk 1/1 Running 0 89m pod/machine-os-builder-6cfbd8d5d-2f7kd 0/1 Terminating 0 3m26s pod/machine-os-builder-6cfbd8d5d-h2ltd 0/1 ContainerCreating 0 1s NAME TYPE FROM STATUS STARTED DURATION build.build.openshift.io/build-rendered-infra-b2473d404d9ddfa1536d2fb32b54d855 Docker Dockerfile Running 12 seconds ago 2. The build pod is removed and the build fails with Error (BuildPodDeleted): NAME TYPE FROM STATUS STARTED DURATION build.build.openshift.io/build-rendered-infra-b2473d404d9ddfa1536d2fb32b54d855 Docker Dockerfile Running 10 seconds ago NAME READY STATUS RESTARTS AGE pod/build-rendered-infra-b2473d404d9ddfa1536d2fb32b54d855-build 1/1 Terminating 0 12s pod/machine-config-controller-5bdd7b66c5-dl4hh 2/2 Running 0 159m pod/machine-config-daemon-5wbw4 2/2 Running 0 159m pod/machine-config-daemon-fqr8x 2/2 Running 0 159m pod/machine-config-daemon-g77zd 2/2 Running 8 152m pod/machine-config-daemon-qzmvv 2/2 Running 16 152m pod/machine-config-daemon-w8mnz 2/2 Running 0 159m pod/machine-config-operator-7dd564556d-mqc5w 2/2 Running 0 161m pod/machine-config-server-28lnp 1/1 Running 0 159m pod/machine-config-server-5csjz 1/1 Running 0 159m pod/machine-config-server-fv4vk 1/1 Running 0 159m pod/machine-os-builder-6cfbd8d5d-g62b6 1/1 Running 0 2m11s NAME TYPE FROM STATUS STARTED DURATION build.build.openshift.io/build-rendered-infra-b2473d404d9ddfa1536d2fb32b54d855 Docker Dockerfile Running 12 seconds ago ..... NAME TYPE FROM STATUS STARTED DURATION build.build.openshift.io/build-rendered-infra-b2473d404d9ddfa1536d2fb32b54d855 Docker Dockerfile Error (BuildPodDeleted) 17 seconds ago 13s
Expected results:
Updating the imageBuilderType while a build is running should not result in the OCB functionlity in a broken status.
Additional info:
Must-gather files are provided in the first commen in this ticket.
- links to
-
RHEA-2024:0041 OpenShift Container Platform 4.16.z bug fix update