Uploaded image for project: 'Machine Config Operator'
  1. Machine Config Operator
  2. MCO-1834

Impact Invalid architecture value found in annotation during 4.19 update

XMLWordPrintable

    • Icon: Spike Spike
    • Resolution: Done
    • Icon: Critical Critical
    • None
    • None
    • 1
    • False
    • Hide

      None

      Show
      None
    • False
    • MCO Sprint 275
    • 0

      Impact statement for the OCPBUGS-60119 series:

      Which 4.y.z to 4.y'.z' updates increase vulnerability?

      • updates from 4.18 to 4.19.z with z<9 if boot image updates are enabled. Note that it can be enabled only on clusters running on AWS or GCP. It is enabled by default in 4.19.
      • The fix is shipped in 4.19.9

      Which types of clusters?

      • GCP or AWS clusters that have machinesets with multiple labels embedded within their capacity.cluster-autoscaler.kubernetes.io/labels annotation. Check your vulnerability with:
        $ oc get machinesets.machine.openshift.io -n openshift-machine-api -o yaml | grep "capacity.cluster-autoscaler.kubernetes.io/labels"                             
        

        The output should normally look like this(1 line per machineset):

              capacity.cluster-autoscaler.kubernetes.io/labels: kubernetes.io/arch=amd64
              capacity.cluster-autoscaler.kubernetes.io/labels: kubernetes.io/arch=amd64
              capacity.cluster-autoscaler.kubernetes.io/labels: kubernetes.io/arch=amd64 

        If you see a list of labels instead of a singular architecture label in the annotation(see below), then your cluster is affected:

        capacity.cluster-autoscaler.kubernetes.io/labels: kubernetes.io/arch=amd64,topology.ebs.csi.aws.com/zone=eu-central-1a        

      What is the impact? Is it serious enough to warrant removing update recommendations?

      • The degraded machine-config Cluster Operator blocks the cluster upgrade.
        The command oc get co should indicate that machine-config is degraded while trying to perform boot image updates. 

      How involved is remediation?

      • The admin can disable boot image updates by explicitly disabling boot image updates by the following manifest:
        apiVersion: operator.openshift.io/v1
        kind: MachineConfiguration
        metadata:
          name: cluster
          namespace: openshift-machine-config-operator
        spec:
          managedBootImages: 
            machineManagers:
            - apiGroup: machine.openshift.io 
              resource: machinesets 
              selection:
                mode: None   

        It unblocks the update process. Then, after checking that the extra label has been removed, you can revert the above change on the MachineConfiguration object to re-enable the boot image controller. The annotation just has the arch label at this point and thus it should not degrade the MCO.

      Is this a regression?

      • Yes, from 4.18.z to 4.19.z.

       

              djoshy David Joshy
              trking W. Trevor King
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: