-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.14.0
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
No
-
None
-
None
-
MCO Sprint 249
-
1
-
In Progress
-
Release Note Not Required
-
N/A
-
None
-
None
-
None
-
None
Description of problem:
In a cluster with a pool using OCB functionality, if we update the imageBuilderType value while an openshift-image-builder pod is building an image, the build fails. It can fail in 2 ways: 1. Removing the running pod that is building the image, and what we get is a failed build reporting "Error (BuildPodDeleted)" 2. The machine-os-builder pod is restarted but the build pod is not removed. Then the build is never removed.
Version-Release number of selected component (if applicable):
$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.14.0-0.nightly-2023-09-12-195514 True False 154m Cluster version is 4.14.0-0.nightly-2023-09-12-195514
How reproducible:
Steps to Reproduce:
1. Create the needed resources to make OCB functionality work (on-cluster-build-config configmap, the secrets and the imageSpec)
We reproduced it using imageBuilderType=""
oc patch cm/on-cluster-build-config -n openshift-machine-config-operator -p '{"data":{"imageBuilderType": ""}}'
2. Create an infra pool and label it so that it can use OCB functionality
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: infra
spec:
machineConfigSelector:
matchExpressions:
- {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]}
nodeSelector:
matchLabels:
node-role.kubernetes.io/infra: ""
oc label mcp/infra machineconfiguration.openshift.io/layering-enabled=
3. Wait until the triggered build has finished.
4. Create a new MC to trigger a new build. This one, for example:
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: test-machine-config
spec:
config:
ignition:
version: 3.1.0
storage:
files:
- contents:
source: data:text/plain;charset=utf-8;base64,dGVzdA==
filesystem: root
mode: 420
path: /etc/test-file.test
5. Just after a new build pod is created, configure the on-cluster-build-config configmap to use the "custom-pod-builder" imageBuilderType
oc patch cm/on-cluster-build-config -n openshift-machine-config-operator -p '{"data":{"imageBuilderType": "custom-pod-builder"}}'
Actual results:
We have observed 2 behaviors after step 5: 1. The machine-os-builder pod is restarted and the build is never removed. build.build.openshift.io/build-rendered-infra-b2473d404d9ddfa1536d2fb32b54d855 Docker Dockerfile Running 10 seconds ago NAME READY STATUS RESTARTS AGE pod/build-rendered-infra-b2473d404d9ddfa1536d2fb32b54d855-build 1/1 Running 0 12s pod/machine-config-controller-5bdd7b66c5-dl4hh 2/2 Running 0 90m pod/machine-config-daemon-5wbw4 2/2 Running 0 90m pod/machine-config-daemon-fqr8x 2/2 Running 0 90m pod/machine-config-daemon-g77zd 2/2 Running 0 83m pod/machine-config-daemon-qzmvv 2/2 Running 0 83m pod/machine-config-daemon-w8mnz 2/2 Running 0 90m pod/machine-config-operator-7dd564556d-mqc5w 2/2 Running 0 92m pod/machine-config-server-28lnp 1/1 Running 0 89m pod/machine-config-server-5csjz 1/1 Running 0 89m pod/machine-config-server-fv4vk 1/1 Running 0 89m pod/machine-os-builder-6cfbd8d5d-2f7kd 0/1 Terminating 0 3m26s pod/machine-os-builder-6cfbd8d5d-h2ltd 0/1 ContainerCreating 0 1s NAME TYPE FROM STATUS STARTED DURATION build.build.openshift.io/build-rendered-infra-b2473d404d9ddfa1536d2fb32b54d855 Docker Dockerfile Running 12 seconds ago 2. The build pod is removed and the build fails with Error (BuildPodDeleted): NAME TYPE FROM STATUS STARTED DURATION build.build.openshift.io/build-rendered-infra-b2473d404d9ddfa1536d2fb32b54d855 Docker Dockerfile Running 10 seconds ago NAME READY STATUS RESTARTS AGE pod/build-rendered-infra-b2473d404d9ddfa1536d2fb32b54d855-build 1/1 Terminating 0 12s pod/machine-config-controller-5bdd7b66c5-dl4hh 2/2 Running 0 159m pod/machine-config-daemon-5wbw4 2/2 Running 0 159m pod/machine-config-daemon-fqr8x 2/2 Running 0 159m pod/machine-config-daemon-g77zd 2/2 Running 8 152m pod/machine-config-daemon-qzmvv 2/2 Running 16 152m pod/machine-config-daemon-w8mnz 2/2 Running 0 159m pod/machine-config-operator-7dd564556d-mqc5w 2/2 Running 0 161m pod/machine-config-server-28lnp 1/1 Running 0 159m pod/machine-config-server-5csjz 1/1 Running 0 159m pod/machine-config-server-fv4vk 1/1 Running 0 159m pod/machine-os-builder-6cfbd8d5d-g62b6 1/1 Running 0 2m11s NAME TYPE FROM STATUS STARTED DURATION build.build.openshift.io/build-rendered-infra-b2473d404d9ddfa1536d2fb32b54d855 Docker Dockerfile Running 12 seconds ago ..... NAME TYPE FROM STATUS STARTED DURATION build.build.openshift.io/build-rendered-infra-b2473d404d9ddfa1536d2fb32b54d855 Docker Dockerfile Error (BuildPodDeleted) 17 seconds ago 13s
Expected results:
Updating the imageBuilderType while a build is running should not result in the OCB functionlity in a broken status.
Additional info:
Must-gather files are provided in the first commen in this ticket.
- links to
-
RHEA-2024:0041
OpenShift Container Platform 4.16.z bug fix update