-
Bug
-
Resolution: Done-Errata
-
Undefined
-
None
-
4.16
-
+
-
Moderate
-
No
-
MCO Sprint 250
-
1
-
False
-
-
No Doc Update
-
Release Note Not Required
-
In Progress
Description of problem:
When, in an IPI on GCP cluster, a machineset is labeled with an invalid architecture and the coreos-bootimage is updated in any machineset, the MCO controller pod fails in an uncontrolled way and panics.
Version-Release number of selected component (if applicable):
$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.16.0-0.nightly-2024-02-17-094036 True False 71m Cluster version is 4.16.0-0.nightly-2024-02-17-094036
How reproducible:
Always
Steps to Reproduce:
1. Enable the TechPreview oc patch featuregate cluster --type=merge -p '{"spec":{"featureSet": "TechPreviewNoUpgrade"}}' 2. Wait for all MCP to be updated 3. Edit a machineset and use an invalid architecture in its labels apiVersion: machine.openshift.io/v1beta1 kind: MachineSet metadata: annotations: capacity.cluster-autoscaler.kubernetes.io/labels: kubernetes.io/arch=amd64-FAKE-INVALID < --- EDIT THIS machine.openshift.io/GPU: "0" machine.openshift.io/memoryMb: "16384" 4. Patch any machineset with a new boot image $ oc -n openshift-machine-api patch machineset.machine $(oc -n openshift-machine-api get machineset.machine -ojsonpath='{.items[0].metadata.name}') --type json -p '[{"op": "add", "path": "/spec/template/spec/providerSpec/value/disks/0/image", "value": "fake-image"}]'
Actual results:
The MCO controller panics I0222 09:05:50.862882 1 template_controller.go:132] Re-syncing ControllerConfig due to secret pull-secret change I0222 09:12:29.550488 1 machine_set_boot_image_controller.go:254] MachineSet sergidor-1-v4ccj-worker-a updated, reconciling all machinesets I0222 09:12:29.550919 1 machine_set_boot_image_controller.go:547] Reconciling machineset sergidor-1-v4ccj-worker-a on GCP, with arch x86_64 I0222 09:12:29.552171 1 machine_set_boot_image_controller.go:572] New target boot image: projects/rhcos-cloud/global/images/rhcos-416-94-202402130130-0-gcp-x86-64 I0222 09:12:29.552323 1 machine_set_boot_image_controller.go:547] Reconciling machineset sergidor-1-v4ccj-worker-b on GCP, with arch x86_64 I0222 09:12:29.552341 1 machine_set_boot_image_controller.go:573] Current image: fake-image I0222 09:12:29.553694 1 machine_set_boot_image_controller.go:413] Patching machineset sergidor-1-v4ccj-worker-a I0222 09:12:29.553893 1 machine_set_boot_image_controller.go:416] No patching required for machineset sergidor-1-v4ccj-worker-b I0222 09:12:29.553920 1 machine_set_boot_image_controller.go:547] Reconciling machineset sergidor-1-v4ccj-worker-c on GCP, with arch x86_64 I0222 09:12:29.555104 1 machine_set_boot_image_controller.go:416] No patching required for machineset sergidor-1-v4ccj-worker-c I0222 09:12:29.555164 1 machine_set_boot_image_controller.go:547] Reconciling machineset sergidor-1-v4ccj-worker-f on GCP, with arch amd64-FAKE-INVALID E0222 09:12:29.556282 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 356 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x34dadc0?, 0x5522aa0}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc001b4a640?}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b panic({0x34dadc0?, 0x5522aa0?}) /usr/lib/golang/src/runtime/panic.go:914 +0x21f github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.reconcileGCP(0xc000f44a00, 0xc000b8c4c3?, {0xc000b8c4c3, 0x12}) /go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:564 +0x1cd github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.checkMachineSet(0x0?, 0x38f77bf?, 0x7?, {0xc000b8c4c3?, 0x132a3f8?}) /go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:479 +0x85 github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).syncMachineSet(0xc000344000, {0xc001bc5800, 0x2f}) /go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:406 +0x60c github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).processNextWorkItem(0xc000344000) /go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:194 +0xcf github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).worker(...) /go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:183 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33 k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x3d4d460, 0xc000b2b710}, 0x1, 0xc0006b21e0) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 +0x1e created by github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).Run in goroutine 339 /go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:174 +0x205 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x315836d] goroutine 356 [running]: k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc001b4a640?}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xcd panic({0x34dadc0?, 0x5522aa0?}) /usr/lib/golang/src/runtime/panic.go:914 +0x21f github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.reconcileGCP(0xc000f44a00, 0xc000b8c4c3?, {0xc000b8c4c3, 0x12}) /go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:564 +0x1cd github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.checkMachineSet(0x0?, 0x38f77bf?, 0x7?, {0xc000b8c4c3?, 0x132a3f8?}) /go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:479 +0x85 github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).syncMachineSet(0xc000344000, {0xc001bc5800, 0x2f}) /go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:406 +0x60c github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).processNextWorkItem(0xc000344000) /go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:194 +0xcf github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).worker(...) /go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:183 k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33 k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x3d4d460, 0xc000b2b710}, 0x1, 0xc0006b21e0) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 +0x1e created by github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).Run in goroutine 339 /go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:174 +0x205 The it is created a new controller, this controller will wait 5 minutes to get the leader and will panic again.
Expected results:
The MCO controller should fail in a controlled way.
Additional info:
- is related to
-
MCO-589 Update boot images for GCP (tech preview)
- Closed
- links to
-
RHEA-2024:0041 OpenShift Container Platform 4.16.z bug fix update