Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-34745

In OCB, when a MOSB is re-used but its image does not exist anymore no error is reported

XMLWordPrintable

    • Moderate
    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      When OCB is enabled in a pool, and a MC is removed and the resulting new rendered MC has already been built using a MachineOSBuild resource, this image is reused and it is not built again.
      
      If this image has been removed, and MCO tries to re-use it, then MCO should report a failure.
      
      Currently it is not reporting any failure and it leaves the same image that is prenset in the nodes without complaining. It lets the nodes in an inconsistent status since they are using an image that is not the same as the one that belongs to the rendered machine config that they are using.
      
      
          

      Version-Release number of selected component (if applicable):

      IPI on AWS
      $ oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.16.0-0.nightly-2024-05-31-062415   True        False         3h54m   Cluster version is 4.16.0-0.nightly-2024-05-31-062415
      
      
          

      How reproducible:

      Always
          

      Steps to Reproduce:

          1. Configure OCB in the worker node
      
      apiVersion: machineconfiguration.openshift.io/v1alpha1
      kind: MachineOSConfig
      metadata:
        name: worker
      spec:
        machineConfigPool:
          name: worker
        buildInputs:
          imageBuilder:
            imageBuilderType: PodImageBuilder
          baseImagePullSecret:
            name: YOUR-SECRET 
          renderedImagePushSecret:
            name: YOUR-SECRET
          renderedImagePushspec: "quay.io/mcoqe/....:latest" <--- An image that you can remove from the repository
      
      
          2. Wait for the image to be built and applied
      
      
          3. Create a new MC
      
      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        labels:
          machineconfiguration.openshift.io/role: worker
        name: test-machine-config-1
      spec:
        config:
          ignition:
            version: 3.1.0
          storage:
            files:
            - contents:
                source: data:text/plain;charset=utf-8;base64,dGVzdA==
              filesystem: root
              mode: 420
              path: /etc/test-file-1.test
      
      
          4. Wait for the machineconfig to be applied
      
      
          5. Now you have 2 MOSB resources. Remove the .status.finalImagePullspec from the repositories, so that they cannot be pulled anymore.
      
      	(Maybe we can just edit the existing MOSC and modify the finalImagePullSpec value to point to an image that does not exist, but I haven't tried it)
          6. Remove the MC created in step 3, so that OCB will try to re-use the first MOSB
      
      
          

      Actual results:

      No build pod is created.
      
      The machine-os-builder pod is restarted, showing this log
      
      I0531 10:17:33.396847       1 pod_build_controller.go:259] Adding Pod machine-config-server-p2mx6. Is build pod? false
      I0531 10:17:33.396871       1 pod_build_controller.go:259] Adding Pod machine-os-builder-66fbf48666-lgvj6. Is build pod? false
      I0531 10:17:33.480580       1 pod_build_controller.go:179] Starting MachineOSBuilder-PodBuildController
      I0531 10:20:26.162824       1 helpers.go:93] Shutting down due to: terminated
      I0531 10:20:26.162929       1 helpers.go:96] Context cancelled
      I0531 10:20:26.162975       1 simple_featuregate_reader.go:177] Shutting down feature-gate-detector
      I0531 10:20:26.163115       1 build_controller.go:351] Shutting down MachineOSBuilder-BuildController
      I0531 10:20:26.163235       1 reflector.go:295] Stopping reflector *v1alpha1.MachineOSBuild (0s) from github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125
      I0531 10:20:26.163266       1 reflector.go:295] Stopping reflector *v1.Pod (0s) from k8s.io/client-go/informers/factory.go:159
      I0531 10:20:26.163283       1 reflector.go:295] Stopping reflector *v1alpha1.MachineOSConfig (0s) from github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125
      I0531 10:20:26.163329       1 reflector.go:295] Stopping reflector *v1.MachineConfigPool (0s) from github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125
      I0531 10:20:26.163354       1 reflector.go:295] Stopping reflector *v1.ControllerConfig (0s) from github.com/openshift/client-go/machineconfiguration/informers/externalversions/factory.go:125
      I0531 10:20:26.163376       1 pod_build_controller.go:187] Shutting down MachineOSBuilder-PodBuildController
      I0531 10:20:26.182616       1 start.go:120] Stopped leading. MOB terminating.
      
      
      The nodes are drained and rebooted, but no new image is applied. We can see in the MCDs a log like this one, skipping the rpm-ostree rebase
      
      
      
      2024-05-31T14:58:49.224099430+00:00 stderr F I0531 14:58:49.224058    2370 update.go:845] Checking Reconcilable for config rendered-worker-cedce993ed207d562707657a7b919a48 to rendered-woker-a8f2a15f227d7c4b677272c3b9c81fa4
      2024-05-31T14:58:49.259456841+00:00 stderr F I0531 14:58:49.259413    2370 update.go:2610] Starting transition from "quay.io/mcoqe/layering@sha256:dabe4305e23b8611944866fc289ebd844e7a68183a08e6d45243571918a184d" to "quay.io/mcoqe/layering@sha256:dabe4305e23b8611944866fc289ebd844e7a681583a08e6d45243571918a184d"
      2024-05-31T14:58:49.260917314+00:00 stderr F I0531 14:58:49.260885    2370 update.go:2610] Update prepared; requesting cordon and drain via annotation to controller
      2024-05-31T15:00:19.282352721+00:00 stderr F I0531 15:00:19.282303    2370 update.go:2610] drain complete
      2024-05-31T15:00:19.283799697+00:00 stderr F I0531 15:00:19.283777    2370 drain.go:114] Successful drain took 90.021720714 seconds
      2024-05-31T15:00:19.283799697+00:00 stderr F I0531 15:00:19.283794    2370 update.go:881] Image pullspecs equal, skipping rpm-ostree rebase
      2024-05-31T15:00:19.283817435+00:00 stderr F I0531 15:00:19.283800    2370 update.go:1824] Updating files
      2024-05-31T15:00:19.283817435+00:00 stderr F I0531 15:00:19.283805    2370 file_writers.go:233] Writing file "/usr/local/bin/nm-clean-initrd-state.sh"
      
      
      No error is reported anywhere.
      
      
          

      Expected results:

      Since the re-used image does not exist anymore, an error should be reported to the user to let them know that something is not working as he expects.
      
          

      Additional info:

          

              team-mco Team MCO
              sregidor@redhat.com Sergio Regidor de la Rosa
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: