Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-19007

OCB builds fail when several MCPs are building at the same time

XMLWordPrintable

      Description of problem:

      
      When in a cluster several MachineConfigPools with the on-cluster-build functionality enabled are building images at the same time, some of those builds fail with status "Error (BuildPodDeleted)".
      
      

      Version-Release number of selected component (if applicable):

      $ oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.14.0-0.nightly-2023-09-12-195514   True        False         6h21m   Cluster version is 4.14.0-0.nightly-2023-09-12-195514
      
      

      How reproducible:

      Always
      

      Steps to Reproduce:

      1. Create the configuration resources needed by the OCB functionality.
      
      To reproduce this issue we use an on-cluster-build-config configmap with an empty imageBuilderType
      
       oc patch cm/on-cluster-build-config -n openshift-machine-config-operator -p '{"data":{"imageBuilderType": ""}}'
      
      2. Create 5 custom pools
      
      
      for n in {1..5}
      do
      echo $n
      
      cat << EOF | oc create -f -
      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfigPool
      metadata:
        name: infra$n
      spec:
        machineConfigSelector:
          matchExpressions:
            - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra$n]}
        nodeSelector:
          matchLabels:
            node-role.kubernetes.io/infra$n: ""
      
      EOF
      done
      
      
      3. Label the pools to enable the OCB functionality
      
      for n in {1..5}
      do
      echo $n
      
       oc label mcp/infra$n machineconfiguration.openshift.io/layering-enabled=
      
      done
      
      4. Wait for the builds to finish. 
      
      The builds should finish OK.
      
      5. Create a MC to trigger another build. This one, for example:
      
      cat << EOF | oc create -f -
      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        labels:
          machineconfiguration.openshift.io/role: worker
        name: test-machine-config
      spec:
        config:
          ignition:
            version: 3.1.0
          storage:
            files:
            - contents:
                source: data:text/plain;charset=utf-8;base64,dGVzdA==
              filesystem: root
              mode: 420
              path: /etc/my-test-file.test
      EOF
      
      
      
      
      
      

      Actual results:

      
      The new builds are triggered, but some of the pods are Terminated before they can finish. Builds are failed with "Error (BuildPodDeleted)"
      
      NAME                                                               READY   STATUS    RESTARTS   AGE
      pod/build-rendered-infra1-fc68772b20de56ea566bb8f81a53e3d1-build   1/1     Running   0          25s
      pod/build-rendered-infra4-fc68772b20de56ea566bb8f81a53e3d1-build   1/1     Running   0          22s
      pod/build-rendered-infra5-fc68772b20de56ea566bb8f81a53e3d1-build   1/1     Running   0          20s
      pod/machine-config-controller-5bdd7b66c5-dl4hh                     2/2     Running   0          6h48m
      pod/machine-config-daemon-5wbw4                                    2/2     Running   0          6h48m
      pod/machine-config-daemon-fqr8x                                    2/2     Running   0          6h48m
      pod/machine-config-daemon-g77zd                                    2/2     Running   12         6h41m
      pod/machine-config-daemon-qzmvv                                    2/2     Running   20         6h41m
      pod/machine-config-daemon-w8mnz                                    2/2     Running   0          6h48m
      pod/machine-config-operator-7dd564556d-mqc5w                       2/2     Running   0          6h50m
      pod/machine-config-server-28lnp                                    1/1     Running   0          6h47m
      pod/machine-config-server-5csjz                                    1/1     Running   0          6h47m
      pod/machine-config-server-fv4vk                                    1/1     Running   0          6h47m
      pod/machine-os-builder-6cfbd8d5d-pbdz5                             1/1     Running   0          4m19s
      
      NAME                                                                              TYPE     FROM         STATUS                    STARTED          DURATION
      build.build.openshift.io/build-rendered-infra1-fc68772b20de56ea566bb8f81a53e3d1   Docker   Dockerfile   Running                   25 seconds ago
      build.build.openshift.io/build-rendered-infra2-fc68772b20de56ea566bb8f81a53e3d1   Docker   Dockerfile   Error (BuildPodDeleted)   25 seconds ago   12s
      build.build.openshift.io/build-rendered-infra3-fc68772b20de56ea566bb8f81a53e3d1   Docker   Dockerfile   Error (BuildPodDeleted)   23 seconds ago   13s
      build.build.openshift.io/build-rendered-infra4-fc68772b20de56ea566bb8f81a53e3d1   Docker   Dockerfile   Running                   22 seconds ago
      build.build.openshift.io/build-rendered-infra5-fc68772b20de56ea566bb8f81a53e3d1   Docker   Dockerfile   Running                   20 seconds ago
      
      
      
      
      

      Expected results:

      The builds should not fail.
      
      
      

      Additional info:

      There is a link to the must-gather file in the first comment in this jira ticket.
      

            rh-ee-iqian Ines Qian
            sregidor@redhat.com Sergio Regidor de la Rosa
            Sergio Regidor de la Rosa Sergio Regidor de la Rosa
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: