-
Bug
-
Resolution: Done-Errata
-
Undefined
-
None
-
4.16
-
Quality / Stability / Reliability
-
False
-
-
1
-
Moderate
-
No
-
None
-
None
-
MCO Sprint 250
-
1
-
+
-
In Progress
-
Release Note Not Required
-
No Doc Update
-
None
-
None
-
None
-
None
Description of problem:
When, in an IPI on GCP cluster, a machineset is labeled with an invalid architecture and the coreos-bootimage is updated in any machineset, the MCO controller pod fails in an uncontrolled way and panics.
Version-Release number of selected component (if applicable):
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.16.0-0.nightly-2024-02-17-094036 True False 71m Cluster version is 4.16.0-0.nightly-2024-02-17-094036
How reproducible:
Always
Steps to Reproduce:
1. Enable the TechPreview
oc patch featuregate cluster --type=merge -p '{"spec":{"featureSet": "TechPreviewNoUpgrade"}}'
2. Wait for all MCP to be updated
3. Edit a machineset and use an invalid architecture in its labels
apiVersion: machine.openshift.io/v1beta1
kind: MachineSet
metadata:
annotations:
capacity.cluster-autoscaler.kubernetes.io/labels: kubernetes.io/arch=amd64-FAKE-INVALID < --- EDIT THIS
machine.openshift.io/GPU: "0"
machine.openshift.io/memoryMb: "16384"
4. Patch any machineset with a new boot image
$ oc -n openshift-machine-api patch machineset.machine $(oc -n openshift-machine-api get machineset.machine -ojsonpath='{.items[0].metadata.name}') --type json -p '[{"op": "add", "path": "/spec/template/spec/providerSpec/value/disks/0/image", "value": "fake-image"}]'
Actual results:
The MCO controller panics
I0222 09:05:50.862882 1 template_controller.go:132] Re-syncing ControllerConfig due to secret pull-secret change
I0222 09:12:29.550488 1 machine_set_boot_image_controller.go:254] MachineSet sergidor-1-v4ccj-worker-a updated, reconciling all machinesets
I0222 09:12:29.550919 1 machine_set_boot_image_controller.go:547] Reconciling machineset sergidor-1-v4ccj-worker-a on GCP, with arch x86_64
I0222 09:12:29.552171 1 machine_set_boot_image_controller.go:572] New target boot image: projects/rhcos-cloud/global/images/rhcos-416-94-202402130130-0-gcp-x86-64
I0222 09:12:29.552323 1 machine_set_boot_image_controller.go:547] Reconciling machineset sergidor-1-v4ccj-worker-b on GCP, with arch x86_64
I0222 09:12:29.552341 1 machine_set_boot_image_controller.go:573] Current image: fake-image
I0222 09:12:29.553694 1 machine_set_boot_image_controller.go:413] Patching machineset sergidor-1-v4ccj-worker-a
I0222 09:12:29.553893 1 machine_set_boot_image_controller.go:416] No patching required for machineset sergidor-1-v4ccj-worker-b
I0222 09:12:29.553920 1 machine_set_boot_image_controller.go:547] Reconciling machineset sergidor-1-v4ccj-worker-c on GCP, with arch x86_64
I0222 09:12:29.555104 1 machine_set_boot_image_controller.go:416] No patching required for machineset sergidor-1-v4ccj-worker-c
I0222 09:12:29.555164 1 machine_set_boot_image_controller.go:547] Reconciling machineset sergidor-1-v4ccj-worker-f on GCP, with arch amd64-FAKE-INVALID
E0222 09:12:29.556282 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 356 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x34dadc0?, 0x5522aa0})
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc001b4a640?})
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x34dadc0?, 0x5522aa0?})
/usr/lib/golang/src/runtime/panic.go:914 +0x21f
github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.reconcileGCP(0xc000f44a00, 0xc000b8c4c3?, {0xc000b8c4c3, 0x12})
/go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:564 +0x1cd
github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.checkMachineSet(0x0?, 0x38f77bf?, 0x7?, {0xc000b8c4c3?, 0x132a3f8?})
/go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:479 +0x85
github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).syncMachineSet(0xc000344000, {0xc001bc5800, 0x2f})
/go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:406 +0x60c
github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).processNextWorkItem(0xc000344000)
/go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:194 +0xcf
github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).worker(...)
/go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:183
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x3d4d460, 0xc000b2b710}, 0x1, 0xc0006b21e0)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 +0x1e
created by github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).Run in goroutine 339
/go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:174 +0x205
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x315836d]
goroutine 356 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc001b4a640?})
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:56 +0xcd
panic({0x34dadc0?, 0x5522aa0?})
/usr/lib/golang/src/runtime/panic.go:914 +0x21f
github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.reconcileGCP(0xc000f44a00, 0xc000b8c4c3?, {0xc000b8c4c3, 0x12})
/go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:564 +0x1cd
github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.checkMachineSet(0x0?, 0x38f77bf?, 0x7?, {0xc000b8c4c3?, 0x132a3f8?})
/go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:479 +0x85
github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).syncMachineSet(0xc000344000, {0xc001bc5800, 0x2f})
/go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:406 +0x60c
github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).processNextWorkItem(0xc000344000)
/go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:194 +0xcf
github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).worker(...)
/go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:183
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0x0?, {0x3d4d460, 0xc000b2b710}, 0x1, 0xc0006b21e0)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0x0?)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.Until(0x0?, 0x0?, 0x0?)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 +0x1e
created by github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image.(*Controller).Run in goroutine 339
/go/src/github.com/openshift/machine-config-operator/pkg/controller/machine-set-boot-image/machine_set_boot_image_controller.go:174 +0x205
The it is created a new controller, this controller will wait 5 minutes to get the leader and will panic again.
Expected results:
The MCO controller should fail in a controlled way.
Additional info:
- is related to
-
MCO-589 Update boot images for GCP (tech preview)
-
- Closed
-
- links to
-
RHEA-2024:0041
OpenShift Container Platform 4.16.z bug fix update