Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-43324

In OCB. When a MOSB fails and we fix the problem, the new MOSB cannot be applied

XMLWordPrintable

    • Moderate
    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      When we enable OCL in a mcp and a machineosbuild fails, if we fix the problem that caused the failure the new images are not correctly deployed in the nodes.
          

      Version-Release number of selected component (if applicable):

      $ oc get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.17.1    True        False         135m    Cluster version is 4.17.1
      
          

      How reproducible:

      Always
          

      Steps to Reproduce:

          1. Create a MSOC resource for the worker pool
          
          To reproduce this issue we use the MCO QE quay repository. To use this repo we previously add the credentials to the pull-secret. We can use any other repo.
          
      oc create -f - << EOF
      apiVersion: machineconfiguration.openshift.io/v1alpha1
      kind: MachineOSConfig
      metadata:
        name: worker
      spec:
        machineConfigPool:
          name: worker
        buildOutputs:
          currentImagePullSecret:
            name: $(oc get -n openshift-machine-config-operator sa default -ojsonpath='{.secrets[0].name}')
        buildInputs:
          imageBuilder:
            imageBuilderType: PodImageBuilder
          baseImagePullSecret:
            name: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy")
          renderedImagePushSecret:
            name: $(oc get -n openshift-machine-config-operator sa builder -ojsonpath='{.secrets[0].name}')
          renderedImagePushspec: "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image:latest"
      EOF
      
          
          
          
          2. Wait until the build is built and deployed on the worker nodes
          
          3. Once the image is correctly applied, edit the MOSC resource to add a custom container file using rhel enablement. Since no rhel enablement secret was configured, OCL should fail when it builds the image.
          
          This is the section that we add to configure the wrong container file:
          
          containerFile:
          - content: |-
              FROM configs AS final
          
              RUN rm -rf /etc/rhsm-host && \
                rpm-ostree install buildah && \
                ln -s /run/secrets/rhsm /etc/rhsm-host && \
                ostree container commit
                
                
          4. Create a new MC (any MC) to trigger a build and Wait until the image fails.
          5. Edit the MOSC resource to use a custom container file that do not need rhel enablement. 
          
          This is the section that we used to fix the custom container file:
          
            containerFile:
            - containerfileArch: noarch
              content: |-
                # Pull the centos base image and enable the EPEL repository.
                FROM quay.io/centos/centos:stream9 AS centos
                RUN dnf install -y epel-release
      
                # Build the final OS image for this MachineConfigPool.
                FROM configs AS final
      
                # Copy the EPEL configs into the final image.
                COPY --from=centos /etc/yum.repos.d /etc/yum.repos.d
                COPY --from=centos /etc/pki/rpm-gpg/RPM-GPG-KEY-* /etc/pki/rpm-gpg/
      
                # Install cowsay and ripgrep from the EPEL repository into the final image,
                # along with a custom cow file.
                RUN sed -i 's/\$stream/9-stream/g' /etc/yum.repos.d/centos*.repo && \
                    rpm-ostree install cowsay ripgrep
      
          
          5. Create another MC (any MC) to trigger a new build again.
          6. The new build succeeds and the image is pushed
          7. The MCP is never updated with the new image
          

      Actual results:

      
      Once we fix the failed build, OCL does not apply the new build to the MCP.
      
      The worker MCP is stuck in this status:
      $ oc get mcp
      NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
      master   rendered-master-4c3a27c89989374052f5f68e8cc2ce3e   True      False      False      3              3                   3                     0                      171m
      worker   rendered-worker-c5ca92b560e6ffec2918864d09c9385e   False     True       False      2              0                   0                     0                      171m
      
      
          

      Expected results:

      If a build fails, and we fix the problem and a new build is created, this new build should be applied in the nodes without problems.
          

      Additional info:

      
          

              team-mco Team MCO
              sregidor@redhat.com Sergio Regidor de la Rosa
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: