Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-18955

OCB. MCO is not degraded when we update the OCB configmap with an invalid imageBuilderType


    • Moderate
    • No
    • MCO Sprint 247, MCO Sprint 248, MCO Sprint 249, MCO Sprint 250, MCO Sprint 251
    • 5
    • False
    • Hide


    • Hide
      * Previously, when a `MachineConfigPool` had the `OnBlusterBuild` functionality enabled and the `configmap` was updated with an invalid `imageBuilderType`, the machine-config ClusterOperator was not degraded. With this release, the Machine Config Operator (MCO) `ClusterOperator` status now validates the `OnBlusterBuild` inputs each time it syncs, ensuring that if those are invalid, the `ClusterOperator` is degraded. (link:https://issues.redhat.com/browse/OCPBUGS-18955[*OCPBUGS-18955*])
      * Previously, when a `MachineConfigPool` had the `OnBlusterBuild` functionality enabled and the `configmap` was updated with an invalid `imageBuilderType`, the machine-config ClusterOperator was not degraded. With this release, the Machine Config Operator (MCO) `ClusterOperator` status now validates the `OnBlusterBuild` inputs each time it syncs, ensuring that if those are invalid, the `ClusterOperator` is degraded. (link: https://issues.redhat.com/browse/OCPBUGS-18955 [* OCPBUGS-18955 *])
    • Bug Fix
    • Done

      Description of problem:

      When a MCP has the on-cluster-build functionality enabled, when we configure a valid imageBuilderType in the on-cluster-build configmap, and later on we update this configmap with an invalid imageBuilderType the machine-config ClusterOperator is not degraded.

      Version-Release number of selected component (if applicable):

      $ oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.14.0-0.nightly-2023-09-12-195514   True        False         3h56m   Cluster version is 4.14.0-0.nightly-2023-09-12-195514

      How reproducible:


      Steps to Reproduce:

      1. Create a valid OCB configmap, and 2 valid secrets. Like this:
      apiVersion: v1
        baseImagePullSecretName: mco-global-pull-secret
        finalImagePullspec: quay.io/mcoqe/layering
        finalImagePushSecretName: mco-test-push-secret
        imageBuilderType: ""
      kind: ConfigMap
        creationTimestamp: "2023-09-13T15:10:37Z"
        name: on-cluster-build-config
        namespace: openshift-machine-config-operator
        resourceVersion: "131053"
        uid: 1e0c66de-7a9a-4787-ab98-ce987a846f66
      3. Label the "worker" MCP in order to enable the OCB functionality in it.
      $ oc label mcp/worker machineconfiguration.openshift.io/layering-enabled=
      4. Wait for the machine-os-builder pod to be created, and for the build to be finished. Just the wait for the pods, do not wait for the MCPs to be updated. As soon as the build pod has finished the build, go to step 5.
      5. Patch the on-cluster-build configmap to use a valid imageBuilderType
       oc patch cm/on-cluster-build-config -n openshift-machine-config-operator -p '{"data":{"imageBuilderType": "fake"}}'

      Actual results:

      The machine-os-builder pod crashes
      $ oc get pods
      NAME                                         READY   STATUS             RESTARTS        AGE
      machine-config-controller-5bdd7b66c5-6l7sz   2/2     Running            2 (45m ago)     63m
      machine-config-daemon-5ttqh                  2/2     Running            0               63m
      machine-config-daemon-l95rj                  2/2     Running            0               63m
      machine-config-daemon-swtc6                  2/2     Running            2               57m
      machine-config-daemon-vq594                  2/2     Running            2               57m
      machine-config-daemon-zrf4f                  2/2     Running            0               63m
      machine-config-operator-7dd564556d-9smk4     2/2     Running            2 (45m ago)     65m
      machine-config-server-9sxjv                  1/1     Running            0               62m
      machine-config-server-m5sdl                  1/1     Running            0               62m
      machine-config-server-zb2hr                  1/1     Running            0               62m
      machine-os-builder-6cfbd8d5d-t6g8w           0/1     CrashLoopBackOff   6 (3m11s ago)   9m16s
      But the machine-config ClusterOperator is not degraded
      $ oc get co machine-config
      NAME             VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      machine-config   4.14.0-0.nightly-2023-09-12-195514   True        False         False      63m     

      Expected results:

      The machine-config ClusterOperator  should become degraded when an invalid imageBuilderType is configured.

      Additional info:

      If we configure an invalid imageBuilderType directly (not by patching/editing the configmap), then the machine-config CO is degraded, but when we edit the configmap it is not.
      A link to the must-gather file is provided in the first comment in this issue
      PS: If we wait for the MCPs to be updated in step 4, the machine-os-builder pod is not restarted with the new "fake" imageBuilderType, but the machine-config CO is not degraded either, and it should. Does it make sense?

            cdoern@redhat.com Charles Doern
            sregidor@redhat.com Sergio Regidor de la Rosa
            Sergio Regidor de la Rosa Sergio Regidor de la Rosa
            Shauna Diaz Shauna Diaz
            0 Vote for this issue
            5 Start watching this issue
