Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.16
Component/s: Machine Config Operator
Labels:
- mco-triaged
- qe-ocb-test

Severity:
Moderate
Regression:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

When OCB is enabled in a pool, and a MC is removed and the resulting new rendered MC has already been built using a MachineOSBuild resource, this image is reused and it is not built again.

If this image has been removed, and MCO tries to re-use it, then MCO should report a failure.

Currently it is not reporting any failure and it leaves the same image that is prenset in the nodes without complaining. It lets the nodes in an inconsistent status since they are using an image that is not the same as the one that belongs to the rendered machine config that they are using.

Version-Release number of selected component (if applicable):

IPI on AWS
$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-0.nightly-2024-05-31-062415   True        False         3h54m   Cluster version is 4.16.0-0.nightly-2024-05-31-062415

How reproducible:

Always

Steps to Reproduce:

    1. Configure OCB in the worker node

apiVersion: machineconfiguration.openshift.io/v1alpha1
kind: MachineOSConfig
metadata:
  name: worker
spec:
  machineConfigPool:
    name: worker
  buildInputs:
    imageBuilder:
      imageBuilderType: PodImageBuilder
    baseImagePullSecret:
      name: YOUR-SECRET 
    renderedImagePushSecret:
      name: YOUR-SECRET
    renderedImagePushspec: "quay.io/mcoqe/....:latest" <--- An image that you can remove from the repository


    2. Wait for the image to be built and applied


    3. Create a new MC

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
  labels:
    machineconfiguration.openshift.io/role: worker
  name: test-machine-config-1
spec:
  config:
    ignition:
      version: 3.1.0
    storage:
      files:
      - contents:
          source: data:text/plain;charset=utf-8;base64,dGVzdA==
        filesystem: root
        mode: 420
        path: /etc/test-file-1.test


    4. Wait for the machineconfig to be applied


    5. Now you have 2 MOSB resources. Remove the .status.finalImagePullspec from the repositories, so that they cannot be pulled anymore.

	(Maybe we can just edit the existing MOSC and modify the finalImagePullSpec value to point to an image that does not exist, but I haven't tried it)
    6. Remove the MC created in step 3, so that OCB will try to re-use the first MOSB

Actual results:

No build pod is created.

The machine-os-builder pod is restarted, showing this log

I0531 10:17:33.396847       1 pod_build_controller.go:259] Adding Pod machine-config-server-p2mx6. Is build pod? false
I0531 10:17:33.396871       1 pod_build_controller.go:259] Adding Pod machine-os-builder-66fbf48666-lgvj6. Is build pod? false
I0531 10:17:33.480580       1 pod_build_controller.go:179] Starting MachineOSBuilder-PodBuildController
I0531 10:20:26.162824       1 helpers.go:93] Shutting down due to: terminated
I0531 10:20:26.162929       1 helpers.go:96] Context cancelled
I0531 10:20:26.162975       1 simple_featuregate_reader.go:177] Shutting down feature-gate-detector
I0531 10:20:26.163115       1 build_controller.go:351] Shutting down MachineOSBuilder-BuildController
I0531 10:20:26.163235       1 reflector.go:295] Stopping reflector *v1alpha1.MachineOSBuild (0s) from github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125
I0531 10:20:26.163266       1 reflector.go:295] Stopping reflector *v1.Pod (0s) from k8s.io/client-go/informers/factory.go:159
I0531 10:20:26.163283       1 reflector.go:295] Stopping reflector *v1alpha1.MachineOSConfig (0s) from github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125
I0531 10:20:26.163329       1 reflector.go:295] Stopping reflector *v1.MachineConfigPool (0s) from github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125
I0531 10:20:26.163354       1 reflector.go:295] Stopping reflector *v1.ControllerConfig (0s) from github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125
I0531 10:20:26.163376       1 pod_build_controller.go:187] Shutting down MachineOSBuilder-PodBuildController
I0531 10:20:26.182616       1 start.go:120] Stopped leading. MOB terminating.


The nodes are drained and rebooted, but no new image is applied. We can see in the MCDs a log like this one, skipping the rpm-ostree rebase



2024-05-31T14:58:49.224099430+00:00 stderr F I0531 14:58:49.224058    2370 update.go:845] Checking Reconcilable for config rendered-worker-cedce993ed207d562707657a7b919a48 to rendered-woker-a8f2a15f227d7c4b677272c3b9c81fa4
2024-05-31T14:58:49.259456841+00:00 stderr F I0531 14:58:49.259413    2370 update.go:2610] Starting transition from "quay.io/mcoqe/layering@sha256:dabe4305e23b8611944866fc289ebd844e7a68183a08e6d45243571918a184d" to "quay.io/mcoqe/layering@sha256:dabe4305e23b8611944866fc289ebd844e7a681583a08e6d45243571918a184d"
2024-05-31T14:58:49.260917314+00:00 stderr F I0531 14:58:49.260885    2370 update.go:2610] Update prepared; requesting cordon and drain via annotation to controller
2024-05-31T15:00:19.282352721+00:00 stderr F I0531 15:00:19.282303    2370 update.go:2610] drain complete
2024-05-31T15:00:19.283799697+00:00 stderr F I0531 15:00:19.283777    2370 drain.go:114] Successful drain took 90.021720714 seconds
2024-05-31T15:00:19.283799697+00:00 stderr F I0531 15:00:19.283794    2370 update.go:881] Image pullspecs equal, skipping rpm-ostree rebase
2024-05-31T15:00:19.283817435+00:00 stderr F I0531 15:00:19.283800    2370 update.go:1824] Updating files
2024-05-31T15:00:19.283817435+00:00 stderr F I0531 15:00:19.283805    2370 file_writers.go:233] Writing file "/usr/local/bin/nm-clean-initrd-state.sh"


No error is reported anywhere.

Expected results:

Since the re-used image does not exist anymore, an error should be reported to the user to let them know that something is not working as he expects.

Additional info:

Assignee:: Team MCO

Reporter:: Sergio Regidor de la Rosa

QA Contact:: Sergio Regidor de la Rosa

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2024/05/31 4:38 PM

Updated:: 2024/06/18 1:46 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates