Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-18989

OCB builds fail if we update the imageBuilderType while it an openshift-image-builder pod is building

XMLWordPrintable

    • Moderate
    • No
    • MCO Sprint 249
    • 1
    • False
    • Hide

      None

      Show
      None
    • the MachineOSBuilder now restarts when the imageBuilderType is updated. This throws away the current build if one is in progress. A new build is started with the proper type or an error is thrown if the type is invalid.
    • Bug Fix
    • In Progress

      Description of problem:

      
      In a cluster with a pool using OCB functionality, if we update the imageBuilderType value while an openshift-image-builder pod is building an image, the build fails.
      
      
      It can fail in 2 ways:
      
      1. Removing the running pod that is building the image, and what we get is a failed build reporting "Error (BuildPodDeleted)"
      2. The machine-os-builder pod is restarted but the build pod is not removed. Then the build is never removed.
      
      
      

      Version-Release number of selected component (if applicable):

      $ oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.14.0-0.nightly-2023-09-12-195514   True        False         154m    Cluster version is 4.14.0-0.nightly-2023-09-12-195514
      
      

      How reproducible:

      
      

      Steps to Reproduce:

      1. Create the needed resources to make OCB functionality work (on-cluster-build-config configmap, the secrets and the imageSpec)
      
      We reproduced it using imageBuilderType=""
      
      oc patch cm/on-cluster-build-config -n openshift-machine-config-operator -p '{"data":{"imageBuilderType": ""}}'
      
      
      2. Create an infra pool and label it so that it can use OCB functionality
      
      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfigPool
      metadata:
        name: infra
      spec:
        machineConfigSelector:
          matchExpressions:
            - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]}
        nodeSelector:
          matchLabels:
            node-role.kubernetes.io/infra: ""
      
      
       oc label mcp/infra machineconfiguration.openshift.io/layering-enabled=
      
      
      
      3. Wait until the triggered build has finished.
      
      4. Create a new MC to trigger a new build. This one, for example:
      
      
      kind: MachineConfig
      metadata:
        labels:
          machineconfiguration.openshift.io/role: worker
        name: test-machine-config
      spec:
        config:
          ignition:
            version: 3.1.0
          storage:
            files:
            - contents:
                source: data:text/plain;charset=utf-8;base64,dGVzdA==
              filesystem: root
              mode: 420
              path: /etc/test-file.test
      
      
      
      5. Just after a new build pod is created, configure the on-cluster-build-config configmap to use the "custom-pod-builder" imageBuilderType
      
      oc patch cm/on-cluster-build-config -n openshift-machine-config-operator -p '{"data":{"imageBuilderType": "custom-pod-builder"}}'
      
      
      
      

      Actual results:

      
      We have observed 2 behaviors after step 5:
      
      
      1. The machine-os-builder pod is restarted and the build is never removed.
      
      build.build.openshift.io/build-rendered-infra-b2473d404d9ddfa1536d2fb32b54d855   Docker   Dockerfile   Running   10 seconds ago
      NAME                                                              READY   STATUS              RESTARTS   AGE
      pod/build-rendered-infra-b2473d404d9ddfa1536d2fb32b54d855-build   1/1     Running             0          12s
      pod/machine-config-controller-5bdd7b66c5-dl4hh                    2/2     Running             0          90m
      pod/machine-config-daemon-5wbw4                                   2/2     Running             0          90m
      pod/machine-config-daemon-fqr8x                                   2/2     Running             0          90m
      pod/machine-config-daemon-g77zd                                   2/2     Running             0          83m
      pod/machine-config-daemon-qzmvv                                   2/2     Running             0          83m
      pod/machine-config-daemon-w8mnz                                   2/2     Running             0          90m
      pod/machine-config-operator-7dd564556d-mqc5w                      2/2     Running             0          92m
      pod/machine-config-server-28lnp                                   1/1     Running             0          89m
      pod/machine-config-server-5csjz                                   1/1     Running             0          89m
      pod/machine-config-server-fv4vk                                   1/1     Running             0          89m
      pod/machine-os-builder-6cfbd8d5d-2f7kd                            0/1     Terminating         0          3m26s
      pod/machine-os-builder-6cfbd8d5d-h2ltd                            0/1     ContainerCreating   0          1s
      
      NAME                                                                             TYPE     FROM         STATUS    STARTED          DURATION
      build.build.openshift.io/build-rendered-infra-b2473d404d9ddfa1536d2fb32b54d855   Docker   Dockerfile   Running   12 seconds ago
      
      
      
      
      2. The build pod is removed and the build fails with Error (BuildPodDeleted):
      
      NAME                                                                             TYPE     FROM         STATUS    STARTED          DURATION
      build.build.openshift.io/build-rendered-infra-b2473d404d9ddfa1536d2fb32b54d855   Docker   Dockerfile   Running   10 seconds ago
      NAME                                                              READY   STATUS        RESTARTS   AGE
      pod/build-rendered-infra-b2473d404d9ddfa1536d2fb32b54d855-build   1/1     Terminating   0          12s
      pod/machine-config-controller-5bdd7b66c5-dl4hh                    2/2     Running       0          159m
      pod/machine-config-daemon-5wbw4                                   2/2     Running       0          159m
      pod/machine-config-daemon-fqr8x                                   2/2     Running       0          159m
      pod/machine-config-daemon-g77zd                                   2/2     Running       8          152m
      pod/machine-config-daemon-qzmvv                                   2/2     Running       16         152m
      pod/machine-config-daemon-w8mnz                                   2/2     Running       0          159m
      pod/machine-config-operator-7dd564556d-mqc5w                      2/2     Running       0          161m
      pod/machine-config-server-28lnp                                   1/1     Running       0          159m
      pod/machine-config-server-5csjz                                   1/1     Running       0          159m
      pod/machine-config-server-fv4vk                                   1/1     Running       0          159m
      pod/machine-os-builder-6cfbd8d5d-g62b6                            1/1     Running       0          2m11s
      
      NAME                                                                             TYPE     FROM         STATUS    STARTED          DURATION
      build.build.openshift.io/build-rendered-infra-b2473d404d9ddfa1536d2fb32b54d855   Docker   Dockerfile   Running   12 seconds ago
      
      .....
      
      
      
      NAME                                                                             TYPE     FROM         STATUS                    STARTED          DURATION
      build.build.openshift.io/build-rendered-infra-b2473d404d9ddfa1536d2fb32b54d855   Docker   Dockerfile   Error (BuildPodDeleted)   17 seconds ago   13s
      
      
      
      
      

      Expected results:

      Updating the imageBuilderType while a build is running should not result in the OCB functionlity in a broken status.
      
      
      

      Additional info:

      
      Must-gather files are provided in the first commen in this ticket.
      

            cdoern@redhat.com Charles Doern
            sregidor@redhat.com Sergio Regidor de la Rosa
            Sergio Regidor de la Rosa Sergio Regidor de la Rosa
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: