-
Bug
-
Resolution: Done-Errata
-
Major
-
4.15.z
-
None
Description of problem:
control-plane-machine-set operator pod stuck into crashloopbackoff state with panic: runtime error: invalid memory address or nil pointer dereference while extracting the failureDomain from the controlplanemachineset. Below is the error trace for reference. ~~~ 2024-04-04T09:32:23.594257072Z I0404 09:32:23.594176 1 controller.go:146] "msg"="Finished reconciling control plane machine set" "controller"="controlplanemachinesetgenerator" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="c282f3e3-9f9d-40df-a24e-417ba2ea4106" 2024-04-04T09:32:23.594257072Z I0404 09:32:23.594221 1 controller.go:125] "msg"="Reconciling control plane machine set" "controller"="controlplanemachinesetgenerator" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="7f03c05f-2717-49e0-95f8-3e8b2ce2fc55" 2024-04-04T09:32:23.594274974Z I0404 09:32:23.594257 1 controller.go:146] "msg"="Finished reconciling control plane machine set" "controller"="controlplanemachinesetgenerator" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="7f03c05f-2717-49e0-95f8-3e8b2ce2fc55" 2024-04-04T09:32:23.597509741Z I0404 09:32:23.597426 1 watch_filters.go:179] reconcile triggered by infrastructure change 2024-04-04T09:32:23.606311553Z I0404 09:32:23.606243 1 controller.go:220] "msg"="Starting workers" "controller"="controlplanemachineset" "worker count"=1 2024-04-04T09:32:23.606360950Z I0404 09:32:23.606340 1 controller.go:169] "msg"="Reconciling control plane machine set" "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="5dac54f4-57ab-419b-b258-79136ca8b400" 2024-04-04T09:32:23.609322467Z I0404 09:32:23.609217 1 panic.go:884] "msg"="Finished reconciling control plane machine set" "controller"="controlplanemachineset" "name"="cluster" "namespace"="openshift-machine-api" "reconcileID"="5dac54f4-57ab-419b-b258-79136ca8b400" 2024-04-04T09:32:23.609322467Z I0404 09:32:23.609271 1 controller.go:115] "msg"="Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference" "controller"="controlplanemachineset" "reconcileID"="5dac54f4-57ab-419b-b258-79136ca8b400" 2024-04-04T09:32:23.612540681Z panic: runtime error: invalid memory address or nil pointer dereference [recovered] 2024-04-04T09:32:23.612540681Z panic: runtime error: invalid memory address or nil pointer dereference 2024-04-04T09:32:23.612540681Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x1a5911c] 2024-04-04T09:32:23.612540681Z 2024-04-04T09:32:23.612540681Z goroutine 255 [running]: 2024-04-04T09:32:23.612540681Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1() 2024-04-04T09:32:23.612571624Z /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:116 +0x1fa 2024-04-04T09:32:23.612571624Z panic({0x1c8ac60, 0x31c6ea0}) 2024-04-04T09:32:23.612571624Z /usr/lib/golang/src/runtime/panic.go:884 +0x213 2024-04-04T09:32:23.612571624Z github.com/openshift/cluster-control-plane-machine-set-operator/pkg/machineproviders/providers/openshift/machine/v1beta1/providerconfig.VSphereProviderConfig.ExtractFailureDomain(...) 2024-04-04T09:32:23.612571624Z /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/pkg/machineproviders/providers/openshift/machine/v1beta1/providerconfig/vsphere.go:120 2024-04-04T09:32:23.612571624Z github.com/openshift/cluster-control-plane-machine-set-operator/pkg/machineproviders/providers/openshift/machine/v1beta1/providerconfig.providerConfig.ExtractFailureDomain({{0x1f2a71a, 0x7}, {{{{...}, {...}}, {{...}, {...}, {...}, {...}, {...}, {...}, ...}, ...}}, ...}) 2024-04-04T09:32:23.612588145Z /go/src/github.com/openshift/cluster-control-plane-machine-set-operator/pkg/machineproviders/providers/openshift/machine/v1beta1/providerconfig/providerconfig.go:212 +0x23c ~~~
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. 2. 3.
Actual results:
control-plane-machine-set operator stuck into crashloopback off state while cluster upgrade.
Expected results:
control-plane-machine-set operator should be upgraded without any errors.
Additional info:
This is happening during the cluster upgrade of Vsphere IPI cluster from OCP version 4.14.z to 4.15.6 and may impact other z stream releases. from the official docs[1] I see providing the failure domain for the Vsphere platform is tech preview feature. [1] https://docs.openshift.com/container-platform/4.15/machine_management/control_plane_machine_management/cpmso-configuration.html#cpmso-yaml-failure-domain-vsphere_cpmso-configuration
- is cloned by
-
OCPBUGS-32414 control-plane-machine-set operator pod stuck into crashloopbackoff state with the nil pointer dereference runtime error
- Closed
- is depended on by
-
OCPBUGS-32414 control-plane-machine-set operator pod stuck into crashloopbackoff state with the nil pointer dereference runtime error
- Closed
- relates to
-
HIVE-2627 The hive-controllers crashing with invalid memory address
- Review
- links to
-
RHEA-2024:0041 OpenShift Container Platform 4.16.z bug fix update