Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-29819

MCO controller crashes when it tries to update the coreos-bootimage and the machineset has an invalid architecture

XMLWordPrintable

      Description of problem:

      When, in an IPI on GCP cluster, a machineset is labeled with an invalid architecture and the coreos-bootimage is updated in any machineset, the MCO controller pod fails in an uncontrolled way and panics.
      
          

      Version-Release number of selected component (if applicable):

      $ oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.16.0-0.nightly-2024-02-17-094036   True        False         71m     Cluster version is 4.16.0-0.nightly-2024-02-17-094036
      
          

      How reproducible:

      Always
          

      Steps to Reproduce:

          1. Enable the TechPreview
      	 oc patch featuregate cluster --type=merge -p '{"spec":{"featureSet": "TechPreviewNoUpgrade"}}'
      
          2. Wait for all MCP to be updated
      
          3. Edit a machineset and use an invalid architecture in its labels
      
      apiVersion: machine.openshift.io/v1beta1
      kind: MachineSet
      metadata:
        annotations:
          capacity.cluster-autoscaler.kubernetes.io/labels: kubernetes.io/arch=amd64-FAKE-INVALID   < --- EDIT THIS
          machine.openshift.io/GPU: "0"
          machine.openshift.io/memoryMb: "16384"
      
          4. Patch any machineset with a new boot image
      
      $ oc -n openshift-machine-api patch machineset.machine $(oc -n openshift-machine-api get machineset.machine -ojsonpath='{.items[0].metadata.name}') --type json -p '[{"op": "add", "path": "/spec/template/spec/providerSpec/value/disks/0/image", "value": "fake-image"}]'
      
      
          

      Actual results:

      
      The MCO controller panics
      
      I0222 09:05:50.862882       1 template_controller.go:132] Re-syncing ControllerConfig due to secret pull-secret change
      I0222 09:12:29.550488       1 machine_set_boot_image_controller.go:254] MachineSet sergidor-1-v4ccj-worker-a updated, reconciling all machinesets
      I0222 09:12:29.550919       1 machine_set_boot_image_controller.go:547] Reconciling machineset sergidor-1-v4ccj-worker-a on GCP, with arch x86_64
      I0222 09:12:29.552171       1 machine_set_boot_image_controller.go:572] New target boot image: projects/rhcos-cloud/global/images/rhcos-416-94-202402130130-0-gcp-x86-64
      I0222 09:12:29.552323       1 machine_set_boot_image_controller.go:547] Reconciling machineset sergidor-1-v4ccj-worker-b on GCP, with arch x86_64
      I0222 09:12:29.552341       1 machine_set_boot_image_controller.go:573] Current image: fake-image
      I0222 09:12:29.553694       1 machine_set_boot_image_controller.go:413] Patching machineset sergidor-1-v4ccj-worker-a
      I0222 09:12:29.553893       1 machine_set_boot_image_controller.go:416] No patching required for machineset sergidor-1-v4ccj-worker-b
      I0222 09:12:29.553920       1 machine_set_boot_image_controller.go:547] Reconciling machineset sergidor-1-v4ccj-worker-c on GCP, with arch x86_64
      I0222 09:12:29.555104       1 machine_set_boot_image_controller.go:416] No patching required for machineset sergidor-1-v4ccj-worker-c
      I0222 09:12:29.555164       1 machine_set_boot_image_controller.go:547] Reconciling machineset sergidor-1-v4ccj-worker-f on GCP, with arch amd64-FAKE-INVALID
      E0222 09:12:29.556282       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
      goroutine 356 [running]:
      k8s.io/apimachinery/pkg/util/runtime.logPanic({0x34dadc0?, 0x5522aa0})
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85
      k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc001b4a640?})
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b
      panic({0x34dadc0?, 0x5522aa0?})
      	/usr/lib/golang/src/runtime/panic.go:914 +0x21f
      github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.reconcileGCP(0xc000f44a00, 0xc000b8c4c3?, {0xc000b8c4c3, 0x12})
      	/go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:564 +0x1cd
      github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.checkMachineSet(0x0?, 0x38f77bf?, 0x7?, {0xc000b8c4c3?, 0x132a3f8?})
      	/go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:479 +0x85
      github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).syncMachineSet(0xc000344000, {0xc001bc5800, 0x2f})
      	/go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:406 +0x60c
      github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).processNextWorkItem(0xc000344000)
      	/go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:194 +0xcf
      github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).worker(...)
      	/go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:183
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x3d4d460, 0xc000b2b710}, 0x1, 0xc0006b21e0)
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
      k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
      k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?)
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 +0x1e
      created by github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).Run in goroutine 339
      	/go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:174 +0x205
      panic: runtime error: invalid memory address or nil pointer dereference [recovered]
      	panic: runtime error: invalid memory address or nil pointer dereference
      [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x315836d]
      
      goroutine 356 [running]:
      k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc001b4a640?})
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xcd
      panic({0x34dadc0?, 0x5522aa0?})
      	/usr/lib/golang/src/runtime/panic.go:914 +0x21f
      github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.reconcileGCP(0xc000f44a00, 0xc000b8c4c3?, {0xc000b8c4c3, 0x12})
      	/go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:564 +0x1cd
      github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.checkMachineSet(0x0?, 0x38f77bf?, 0x7?, {0xc000b8c4c3?, 0x132a3f8?})
      	/go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:479 +0x85
      github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).syncMachineSet(0xc000344000, {0xc001bc5800, 0x2f})
      	/go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:406 +0x60c
      github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).processNextWorkItem(0xc000344000)
      	/go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:194 +0xcf
      github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).worker(...)
      	/go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:183
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
      k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x3d4d460, 0xc000b2b710}, 0x1, 0xc0006b21e0)
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
      k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
      k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?)
      	/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 +0x1e
      created by github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).Run in goroutine 339
      	/go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:174 +0x205
      
      
      
      
      The it is created a new controller, this controller will wait 5 minutes to get the leader and will panic again.
      
      
      
          

      Expected results:

      The MCO controller should fail in a controlled way.
          

      Additional info:

      
          

              djoshy David Joshy
              sregidor@redhat.com Sergio Regidor de la Rosa
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: