-
Bug
-
Resolution: Done-Errata
-
Critical
-
premerge, 4.16
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
Approved
-
MCO Sprint 254, MCO Sprint 255
-
2
-
Done
-
Bug Fix
-
-
None
-
None
-
None
-
None
Description of problem:
Given that we create a new pool, and we enable OCB in this pool, and we remove the pool and the MachineOSConfig resource, and we create another new pool to enable OCB again, then the controller pod panics.
Version-Release number of selected component (if applicable):
pre-merge https://github.com/openshift/machine-config-operator/pull/4327
How reproducible:
Always
Steps to Reproduce:
1. Create a new infra MCP
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: infra
spec:
machineConfigSelector:
matchExpressions:
- {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra]}
nodeSelector:
matchLabels:
node-role.kubernetes.io/infra: ""
2. Create a MachineOSConfig for infra pool
oc create -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1alpha1
kind: MachineOSConfig
metadata:
name: infra
spec:
machineConfigPool:
name: infra
buildInputs:
imageBuilder:
imageBuilderType: PodImageBuilder
baseImagePullSecret:
name: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy")
renderedImagePushSecret:
name: $(oc get -n openshift-machine-config-operator sa builder -ojsonpath='{.secrets[0].name}')
renderedImagePushspec: "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image:latest"
EOF
3. When the build is finished, remove the MachineOSConfig and the pool
oc delete machineosconfig infra
oc delete mcp infra
4. Create a new infra1 pool
apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: infra1
spec:
machineConfigSelector:
matchExpressions:
- {key: machineconfiguration.openshift.io/role, operator: In, values: [worker,infra1]}
nodeSelector:
matchLabels:
node-role.kubernetes.io/infra1: ""
5. Create a new machineosconfig for infra1 pool
oc create -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1alpha1
kind: MachineOSConfig
metadata:
name: infra1
spec:
machineConfigPool:
name: infra1
buildInputs:
imageBuilder:
imageBuilderType: PodImageBuilder
baseImagePullSecret:
name: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy")
renderedImagePushSecret:
name: $(oc get -n openshift-machine-config-operator sa builder -ojsonpath='{.secrets[0].name}')
renderedImagePushspec: "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image:latest"
containerFile:
- containerfileArch: noarch
content: |-
RUN echo 'test image' > /etc/test-image.file
EOF
Actual results:
The MCO controller pod panics (in updateMachineOSBuild):
E0430 11:21:03.779078 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 265 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x3547bc0?, 0x53ebb20})
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc00035e000?})
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x3547bc0?, 0x53ebb20?})
/usr/lib/golang/src/runtime/panic.go:914 +0x21f
github.com/openshift/api/machineconfiguration/v1.(*MachineConfigPool).GetNamespace(0x53f6200?)
<autogenerated>:1 +0x9
k8s.io/client-go/tools/cache.MetaObjectToName({0x3e2a8f8, 0x0})
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:131 +0x25
k8s.io/client-go/tools/cache.ObjectToName({0x3902740?, 0x0?})
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:126 +0x74
k8s.io/client-go/tools/cache.MetaNamespaceKeyFunc({0x3902740?, 0x0?})
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:112 +0x3e
k8s.io/client-go/tools/cache.DeletionHandlingMetaNamespaceKeyFunc({0x3902740?, 0x0?})
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:336 +0x3b
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueAfter(0xc0007097a0, 0x0, 0x0?)
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:761 +0x33
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueDefault(...)
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:772
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).updateMachineOSBuild(0xc0007097a0, {0xc001c37800?, 0xc000029678?}, {0x3904000?, 0xc0028361a0})
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:395 +0xd1
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:246
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:970 +0xea
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0005e5738?, {0x3de6020, 0xc0008fe780}, 0x1, 0xc0000ac720)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x6974616761706f72?, 0x3b9aca00, 0x0, 0x69?, 0xc0005e5788?)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.Until(...)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161
k8s.io/client-go/tools/cache.(*processorListener).run(0xc000b97c20)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:966 +0x69
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x4f
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 248
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:70 +0x73
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x210a6e9]
When the controller pod is restarted, it panics again, but in a different function (addMachineOSBuild):
E0430 11:26:54.753689 1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 97 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x3547bc0?, 0x53ebb20})
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:75 +0x85
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0x15555555aa?})
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:49 +0x6b
panic({0x3547bc0?, 0x53ebb20?})
/usr/lib/golang/src/runtime/panic.go:914 +0x21f
github.com/openshift/api/machineconfiguration/v1.(*MachineConfigPool).GetNamespace(0x53f6200?)
<autogenerated>:1 +0x9
k8s.io/client-go/tools/cache.MetaObjectToName({0x3e2a8f8, 0x0})
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:131 +0x25
k8s.io/client-go/tools/cache.ObjectToName({0x3902740?, 0x0?})
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:126 +0x74
k8s.io/client-go/tools/cache.MetaNamespaceKeyFunc({0x3902740?, 0x0?})
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/store.go:112 +0x3e
k8s.io/client-go/tools/cache.DeletionHandlingMetaNamespaceKeyFunc({0x3902740?, 0x0?})
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:336 +0x3b
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueAfter(0xc000899560, 0x0, 0x0?)
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:761 +0x33
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).enqueueDefault(...)
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:772
github.com/openshift/machine-config-operator/pkg/controller/node.(*Controller).addMachineOSBuild(0xc000899560, {0x3904000?, 0xc0006a8b60})
/go/src/github.com/openshift/machine-config-operator/pkg/controller/node/node_controller.go:386 +0xc5
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnAdd(...)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/controller.go:239
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:972 +0x13e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:226 +0x33
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc00066bf38?, {0x3de6020, 0xc0008f8b40}, 0x1, 0xc000c2ea20)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:227 +0xaf
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x0?, 0x3b9aca00, 0x0, 0x0?, 0xc00066bf88?)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:204 +0x7f
k8s.io/apimachinery/pkg/util/wait.Until(...)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/backoff.go:161
k8s.io/client-go/tools/cache.(*processorListener).run(0xc000ba6240)
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/client-go/tools/cache/shared_informer.go:966 +0x69
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:72 +0x4f
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start in goroutine 43
/go/src/github.com/openshift/machine-config-operator/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:70 +0x73
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x40 pc=0x210a6e9]
Expected results:
No panic should happen. Errors should be controlled.
Additional info:
In order to recover from this panic, we need to manually delete the MachineOSBuild resources that are related to the pool that does not exist anymore.
- blocks
-
OCPBUGS-35299 Panic when we remove an OCL infra MCP and we try to create new ones with different names
-
- Closed
-
- is cloned by
-
OCPBUGS-35299 Panic when we remove an OCL infra MCP and we try to create new ones with different names
-
- Closed
-
- relates to
-
MCO-665 On-Cluster Layering Tech Preview
-
- Closed
-
- links to
-
RHEA-2024:3718
OpenShift Container Platform 4.17.z bug fix update