-
Bug
-
Resolution: Done-Errata
-
Critical
-
None
-
premerge, 4.16
-
Moderate
-
None
-
MCO Sprint 254, MCO Sprint 255
-
2
-
Approved
-
False
-
-
-
Bug Fix
-
Done
Description of problem:
Given that we create a new pool, and we enable OCB in this pool, and we remove the pool and the MachineOSConfig resource, and we create another new pool to enable OCB again, then the controller pod panics.
Version-Release number of selected component (if applicable):
pre-merge https://github.com/openshift/machine-config-operator/pull/4327
How reproducible:
Always
Steps to Reproduce:
1. Create a new infra MCP apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: infra spec: machineConfigSelector: matchExpressions: - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]} nodeSelector: matchLabels: node-role.kubernetes.io/infra: "" 2. Create a MachineOSConfig for infra pool oc create -f - << EOF apiVersion: machineconfiguration.openshift.io/v1alpha1 kind: MachineOSConfig metadata: name: infra spec: machineConfigPool: name: infra buildInputs: imageBuilder: imageBuilderType: PodImageBuilder baseImagePullSecret: name: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy") renderedImagePushSecret: name: $(oc get -n openshift-machine-config-operator sa builder -ojsonpath='{.secrets[0].name}') renderedImagePushspec: "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image:latest" EOF 3. When the build is finished, remove the MachineOSConfig and the pool oc delete machineosconfig infra oc delete mcp infra 4. Create a new infra1 pool apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfigPool metadata: name: infra1 spec: machineConfigSelector: matchExpressions: - {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra1]} nodeSelector: matchLabels: node-role.kubernetes.io/infra1: "" 5. Create a new machineosconfig for infra1 pool oc create -f - << EOF apiVersion: machineconfiguration.openshift.io/v1alpha1 kind: MachineOSConfig metadata: name: infra1 spec: machineConfigPool: name: infra1 buildInputs: imageBuilder: imageBuilderType: PodImageBuilder baseImagePullSecret: name: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy") renderedImagePushSecret: name: $(oc get -n openshift-machine-config-operator sa builder -ojsonpath='{.secrets[0].name}') renderedImagePushspec: "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image:latest" containerFile: - containerfileArch: noarch content: |- RUN echo 'test image' > /etc/test-image.file EOF
Actual results:
The MCO controller pod panics (in updateMachineOSBuild): E0430 11:21:03.779078 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 265 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x3547bc0?, 0x53ebb20}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc00035e000?}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b panic({0x3547bc0?, 0x53ebb20?}) /usr/lib/golang/src/runtime/panic.go:914 +0x21f github.com/openshift/api/machineconfiguration/v1.(*MachineConfigPool).GetNamespace(0x53f6200?) <autogenerated>:1 +0x9 k8s.io/client-go/tools/cache.MetaObjectToName({0x3e2a8f8, 0x0}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:131 +0x25 k8s.io/client-go/tools/cache.ObjectToName({0x3902740?, 0x0?}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:126 +0x74 k8s.io/client-go/tools/cache.MetaNamespaceKeyFunc({0x3902740?, 0x0?}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:112 +0x3e k8s.io/client-go/tools/cache.DeletionHandlingMetaNamespaceKeyFunc({0x3902740?, 0x0?}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:336 +0x3b github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueAfter(0xc0007097a0, 0x0, 0x0?) /go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:761 +0x33 github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueDefault(...) /go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:772 github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).updateMachineOSBuild(0xc0007097a0, {0xc001c37800?, 0xc000029678?}, {0x3904000?, 0xc0028361a0}) /go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:395 +0xd1 k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:246 k8s.io/client-go/tools/cache.(*processorListener).run.func1() /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:970 +0xea k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33 k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0005e5738?, {0x3de6020, 0xc0008fe780}, 0x1, 0xc0000ac720) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x6974616761706f72?, 0x3b9aca00, 0x0, 0x69?, 0xc0005e5788?) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f k8s.io/apimachinery/pkg/util/wait.Until(...) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 k8s.io/client-go/tools/cache.(*processorListener).run(0xc000b97c20) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:966 +0x69 k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1() /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x4f created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 248 /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:70 +0x73 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x210a6e9] When the controller pod is restarted, it panics again, but in a different function (addMachineOSBuild): E0430 11:26:54.753689 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference) goroutine 97 [running]: k8s.io/apimachinery/pkg/util/runtime.logPanic({0x3547bc0?, 0x53ebb20}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85 k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x15555555aa?}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b panic({0x3547bc0?, 0x53ebb20?}) /usr/lib/golang/src/runtime/panic.go:914 +0x21f github.com/openshift/api/machineconfiguration/v1.(*MachineConfigPool).GetNamespace(0x53f6200?) <autogenerated>:1 +0x9 k8s.io/client-go/tools/cache.MetaObjectToName({0x3e2a8f8, 0x0}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:131 +0x25 k8s.io/client-go/tools/cache.ObjectToName({0x3902740?, 0x0?}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:126 +0x74 k8s.io/client-go/tools/cache.MetaNamespaceKeyFunc({0x3902740?, 0x0?}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:112 +0x3e k8s.io/client-go/tools/cache.DeletionHandlingMetaNamespaceKeyFunc({0x3902740?, 0x0?}) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:336 +0x3b github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueAfter(0xc000899560, 0x0, 0x0?) /go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:761 +0x33 github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueDefault(...) /go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:772 github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).addMachineOSBuild(0xc000899560, {0x3904000?, 0xc0006a8b60}) /go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:386 +0xc5 k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd(...) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:239 k8s.io/client-go/tools/cache.(*processorListener).run.func1() /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:972 +0x13e k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33 k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00066bf38?, {0x3de6020, 0xc0008f8b40}, 0x1, 0xc000c2ea20) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0xc00066bf88?) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f k8s.io/apimachinery/pkg/util/wait.Until(...) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161 k8s.io/client-go/tools/cache.(*processorListener).run(0xc000ba6240) /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:966 +0x69 k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1() /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x4f created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 43 /go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:70 +0x73 panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x210a6e9]
Expected results:
No panic should happen. Errors should be controlled.
Additional info:
In order to recover from this panic, we need to manually delete the MachineOSBuild resources that are related to the pool that does not exist anymore.
- blocks
-
OCPBUGS-35299 Panic when we remove an OCL infra MCP and we try to create new ones with different names
- Closed
- is cloned by
-
OCPBUGS-35299 Panic when we remove an OCL infra MCP and we try to create new ones with different names
- Closed
- relates to
-
MCO-665 On-Cluster Layering Tech Preview
- Closed
- links to
-
RHEA-2024:3718 OpenShift Container Platform 4.17.z bug fix update