-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.16
-
Quality / Stability / Reliability
-
False
-
-
3
-
Moderate
-
None
-
None
-
None
-
MCO Sprint 266, MCO Sprint 267, MCO Sprint 268, MCO Sprint 269, MCO Sprint 270, MCO Sprint 271
-
6
-
In Progress
-
Release Note Not Required
-
N/A
-
None
-
None
-
None
-
None
Description of problem:
When OCB is enabled in a pool, and a MC is removed and the resulting new rendered MC has already been built using a MachineOSBuild resource, this image is reused and it is not built again.
If this image has been removed, and MCO tries to re-use it, then MCO should report a failure.
Currently it is not reporting any failure and it leaves the same image that is prenset in the nodes without complaining. It lets the nodes in an inconsistent status since they are using an image that is not the same as the one that belongs to the rendered machine config that they are using.
Version-Release number of selected component (if applicable):
IPI on AWS
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.16.0-0.nightly-2024-05-31-062415 True False 3h54m Cluster version is 4.16.0-0.nightly-2024-05-31-062415
How reproducible:
Always
Steps to Reproduce:
1. Configure OCB in the worker node
apiVersion: machineconfiguration.openshift.io/v1alpha1
kind: MachineOSConfig
metadata:
name: worker
spec:
machineConfigPool:
name: worker
buildInputs:
imageBuilder:
imageBuilderType: PodImageBuilder
baseImagePullSecret:
name: YOUR-SECRET
renderedImagePushSecret:
name: YOUR-SECRET
renderedImagePushspec: "quay.io/mcoqe/....:latest" <--- An image that you can remove from the repository
2. Wait for the image to be built and applied
3. Create a new MC
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: worker
name: test-machine-config-1
spec:
config:
ignition:
version: 3.1.0
storage:
files:
- contents:
source: data:text/plain;charset=utf-8;base64,dGVzdA==
filesystem: root
mode: 420
path: /etc/test-file-1.test
4. Wait for the machineconfig to be applied
5. Now you have 2 MOSB resources. Remove the .status.finalImagePullspec from the repositories, so that they cannot be pulled anymore.
(Maybe we can just edit the existing MOSC and modify the finalImagePullSpec value to point to an image that does not exist, but I haven't tried it)
6. Remove the MC created in step 3, so that OCB will try to re-use the first MOSB
Actual results:
No build pod is created.
The machine-os-builder pod is restarted, showing this log
I0531 10:17:33.396847 1 pod_build_controller.go:259] Adding Pod machine-config-server-p2mx6. Is build pod? false
I0531 10:17:33.396871 1 pod_build_controller.go:259] Adding Pod machine-os-builder-66fbf48666-lgvj6. Is build pod? false
I0531 10:17:33.480580 1 pod_build_controller.go:179] Starting MachineOSBuilder-PodBuildController
I0531 10:20:26.162824 1 helpers.go:93] Shutting down due to: terminated
I0531 10:20:26.162929 1 helpers.go:96] Context cancelled
I0531 10:20:26.162975 1 simple_featuregate_reader.go:177] Shutting down feature-gate-detector
I0531 10:20:26.163115 1 build_controller.go:351] Shutting down MachineOSBuilder-BuildController
I0531 10:20:26.163235 1 reflector.go:295] Stopping reflector *v1alpha1.MachineOSBuild (0s) from github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125
I0531 10:20:26.163266 1 reflector.go:295] Stopping reflector *v1.Pod (0s) from k8s.io/client-go/informers/factory.go:159
I0531 10:20:26.163283 1 reflector.go:295] Stopping reflector *v1alpha1.MachineOSConfig (0s) from github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125
I0531 10:20:26.163329 1 reflector.go:295] Stopping reflector *v1.MachineConfigPool (0s) from github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125
I0531 10:20:26.163354 1 reflector.go:295] Stopping reflector *v1.ControllerConfig (0s) from github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125
I0531 10:20:26.163376 1 pod_build_controller.go:187] Shutting down MachineOSBuilder-PodBuildController
I0531 10:20:26.182616 1 start.go:120] Stopped leading. MOB terminating.
The nodes are drained and rebooted, but no new image is applied. We can see in the MCDs a log like this one, skipping the rpm-ostree rebase
2024-05-31T14:58:49.224099430+00:00 stderr F I0531 14:58:49.224058 2370 update.go:845] Checking Reconcilable for config rendered-worker-cedce993ed207d562707657a7b919a48 to rendered-woker-a8f2a15f227d7c4b677272c3b9c81fa4
2024-05-31T14:58:49.259456841+00:00 stderr F I0531 14:58:49.259413 2370 update.go:2610] Starting transition from "quay.io/mcoqe/layering@sha256:dabe4305e23b8611944866fc289ebd844e7a68183a08e6d45243571918a184d" to "quay.io/mcoqe/layering@sha256:dabe4305e23b8611944866fc289ebd844e7a681583a08e6d45243571918a184d"
2024-05-31T14:58:49.260917314+00:00 stderr F I0531 14:58:49.260885 2370 update.go:2610] Update prepared; requesting cordon and drain via annotation to controller
2024-05-31T15:00:19.282352721+00:00 stderr F I0531 15:00:19.282303 2370 update.go:2610] drain complete
2024-05-31T15:00:19.283799697+00:00 stderr F I0531 15:00:19.283777 2370 drain.go:114] Successful drain took 90.021720714 seconds
2024-05-31T15:00:19.283799697+00:00 stderr F I0531 15:00:19.283794 2370 update.go:881] Image pullspecs equal, skipping rpm-ostree rebase
2024-05-31T15:00:19.283817435+00:00 stderr F I0531 15:00:19.283800 2370 update.go:1824] Updating files
2024-05-31T15:00:19.283817435+00:00 stderr F I0531 15:00:19.283805 2370 file_writers.go:233] Writing file "/usr/local/bin/nm-clean-initrd-state.sh"
No error is reported anywhere.
Expected results:
Since the re-used image does not exist anymore, an error should be reported to the user to let them know that something is not working as he expects.
Additional info:
- links to
-
RHEA-2024:11038
OpenShift Container Platform 4.19.z bug fix update