-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.16
-
Moderate
-
None
-
False
-
Description of problem:
When OCB is enabled in a pool, and a MC is removed and the resulting new rendered MC has already been built using a MachineOSBuild resource, this image is reused and it is not built again. If this image has been removed, and MCO tries to re-use it, then MCO should report a failure. Currently it is not reporting any failure and it leaves the same image that is prenset in the nodes without complaining. It lets the nodes in an inconsistent status since they are using an image that is not the same as the one that belongs to the rendered machine config that they are using.
Version-Release number of selected component (if applicable):
IPI on AWS $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.16.0-0.nightly-2024-05-31-062415 True False 3h54m Cluster version is 4.16.0-0.nightly-2024-05-31-062415
How reproducible:
Always
Steps to Reproduce:
1. Configure OCB in the worker node apiVersion: machineconfiguration.openshift.io/v1alpha1 kind: MachineOSConfig metadata: name: worker spec: machineConfigPool: name: worker buildInputs: imageBuilder: imageBuilderType: PodImageBuilder baseImagePullSecret: name: YOUR-SECRET renderedImagePushSecret: name: YOUR-SECRET renderedImagePushspec: "quay.io/mcoqe/....:latest" <--- An image that you can remove from the repository 2. Wait for the image to be built and applied 3. Create a new MC apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: test-machine-config-1 spec: config: ignition: version: 3.1.0 storage: files: - contents: source: data:text/plain;charset=utf-8;base64,dGVzdA== filesystem: root mode: 420 path: /etc/test-file-1.test 4. Wait for the machineconfig to be applied 5. Now you have 2 MOSB resources. Remove the .status.finalImagePullspec from the repositories, so that they cannot be pulled anymore. (Maybe we can just edit the existing MOSC and modify the finalImagePullSpec value to point to an image that does not exist, but I haven't tried it) 6. Remove the MC created in step 3, so that OCB will try to re-use the first MOSB
Actual results:
No build pod is created. The machine-os-builder pod is restarted, showing this log I0531 10:17:33.396847 1 pod_build_controller.go:259] Adding Pod machine-config-server-p2mx6. Is build pod? false I0531 10:17:33.396871 1 pod_build_controller.go:259] Adding Pod machine-os-builder-66fbf48666-lgvj6. Is build pod? false I0531 10:17:33.480580 1 pod_build_controller.go:179] Starting MachineOSBuilder-PodBuildController I0531 10:20:26.162824 1 helpers.go:93] Shutting down due to: terminated I0531 10:20:26.162929 1 helpers.go:96] Context cancelled I0531 10:20:26.162975 1 simple_featuregate_reader.go:177] Shutting down feature-gate-detector I0531 10:20:26.163115 1 build_controller.go:351] Shutting down MachineOSBuilder-BuildController I0531 10:20:26.163235 1 reflector.go:295] Stopping reflector *v1alpha1.MachineOSBuild (0s) from github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125 I0531 10:20:26.163266 1 reflector.go:295] Stopping reflector *v1.Pod (0s) from k8s.io/client-go/informers/factory.go:159 I0531 10:20:26.163283 1 reflector.go:295] Stopping reflector *v1alpha1.MachineOSConfig (0s) from github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125 I0531 10:20:26.163329 1 reflector.go:295] Stopping reflector *v1.MachineConfigPool (0s) from github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125 I0531 10:20:26.163354 1 reflector.go:295] Stopping reflector *v1.ControllerConfig (0s) from github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125 I0531 10:20:26.163376 1 pod_build_controller.go:187] Shutting down MachineOSBuilder-PodBuildController I0531 10:20:26.182616 1 start.go:120] Stopped leading. MOB terminating. The nodes are drained and rebooted, but no new image is applied. We can see in the MCDs a log like this one, skipping the rpm-ostree rebase 2024-05-31T14:58:49.224099430+00:00 stderr F I0531 14:58:49.224058 2370 update.go:845] Checking Reconcilable for config rendered-worker-cedce993ed207d562707657a7b919a48 to rendered-woker-a8f2a15f227d7c4b677272c3b9c81fa4 2024-05-31T14:58:49.259456841+00:00 stderr F I0531 14:58:49.259413 2370 update.go:2610] Starting transition from "quay.io/mcoqe/layering@sha256:dabe4305e23b8611944866fc289ebd844e7a68183a08e6d45243571918a184d" to "quay.io/mcoqe/layering@sha256:dabe4305e23b8611944866fc289ebd844e7a681583a08e6d45243571918a184d" 2024-05-31T14:58:49.260917314+00:00 stderr F I0531 14:58:49.260885 2370 update.go:2610] Update prepared; requesting cordon and drain via annotation to controller 2024-05-31T15:00:19.282352721+00:00 stderr F I0531 15:00:19.282303 2370 update.go:2610] drain complete 2024-05-31T15:00:19.283799697+00:00 stderr F I0531 15:00:19.283777 2370 drain.go:114] Successful drain took 90.021720714 seconds 2024-05-31T15:00:19.283799697+00:00 stderr F I0531 15:00:19.283794 2370 update.go:881] Image pullspecs equal, skipping rpm-ostree rebase 2024-05-31T15:00:19.283817435+00:00 stderr F I0531 15:00:19.283800 2370 update.go:1824] Updating files 2024-05-31T15:00:19.283817435+00:00 stderr F I0531 15:00:19.283805 2370 file_writers.go:233] Writing file "/usr/local/bin/nm-clean-initrd-state.sh" No error is reported anywhere.
Expected results:
Since the re-used image does not exist anymore, an error should be reported to the user to let them know that something is not working as he expects.
Additional info: