-
Bug
-
Resolution: Done-Errata
-
Blocker
-
rhos-18.0 Feature Release 1 (Nov 2024)
-
None
-
3
-
False
-
-
False
-
?
-
?
-
openstack-operator-container-1.0.4-6
-
?
-
?
-
Yes
-
-
-
Approved
-
Important
A failing deployment job causes a nil pointer deference:
2024-10-30T14:32:11.001Z INFO Observed a panic in reconciler: runtime error: invalid memory address or nil pointer dereference {"controller": "openstackdataplanedeployment", "controllerGroup": "dataplane.openstack.org", "controllerKind": "OpenStackDataPlaneDeployment", "OpenStackDataPlaneDeployment": {"name":"edpm-deployment","namespace":"openstack"}, "namespace": "openstack", "name": "edpm-deployment", "rec oncileID": "242891d0-88cc-4d4a-a5af-0ef99b779d3f"} panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x58 pc=0x1ac64ca] goroutine 903 [running]: sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile.func1() /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.6/pkg/internal/controller/controller.go:116 +0x1e5 panic({0x1d3d280?, 0x3464d30?}) /usr/lib/golang/src/runtime/panic.go:914 +0x21f github.com/openstack-k8s-operators/openstack-operator/pkg/dataplane.(*Deployer).ConditionalDeploy(_, {_, _}, {_, _}, {_, _}, {_, _}, {0xc0042b0690, ...}, ...) /remote-source/pkg/dataplane/deployment.go:237 +0xe2a github.com/openstack-k8s-operators/openstack-operator/pkg/dataplane.(*Deployer).Deploy(0xc00108c420, {0xc003b21100, 0x10, 0x10}) /remote-source/pkg/dataplane/deployment.go:136 +0x6e5 github.com/openstack-k8s-operators/openstack-operator/controllers/dataplane.(*OpenStackDataPlaneDeploymentReconciler).Reconcile(0xc000619ad0, {0x23dbe58?, 0xc0031404e0}, {{{0xc004582ae3?, 0x5?}, {0xc004582b20?, 0xc00108dd08?}}}) /remote-source/controllers/dataplane/openstackdataplanedeployment_controller.go:333 +0x1cea sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0x23df248?, {0x23dbe58?, 0xc0031404e0?}, {{{0xc004582ae3?, 0xb?}, {0xc004582b20?, 0x0?}}}) /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.6/pkg/internal/controller/controller.go:119 +0xb7 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0006263c0, {0x23dbe90, 0xc000130500}, {0x1e205c0?, 0xc002a08a00?}) /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.6/pkg/internal/controller/controller.go:316 +0x3cc sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0006263c0, {0x23dbe90, 0xc000130500}) /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.6/pkg/internal/controller/controller.go:266 +0x1af sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2() /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.6/pkg/internal/controller/controller.go:227 +0x79 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 in goroutine 243 /opt/app-root/src/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.17.6/pkg/internal/controller/controller.go:223 +0x565
Job status at the time is:
status:
failed: 2
ready: 0
startTime: "2024-10-30T14:32:01Z"
terminating: 0
uncountedTerminatedPods: {}
Issue occurs here:
Because ansibleCondition will be nil if there is no condition on the job status or the condition type is not batchv1.JobFailed
The operator eventually recovers once the job backoff limit is reached and the expected condition is set on the job.
While the operator is down it's CR cannot be altered e.g
oc edit -n openstack openstackcontrolplane.core.openstack.org openstack-galera-network-isolation error: openstackcontrolplanes.core.openstack.org "openstack-galera-network-isolation" could not be patched: Internal error occurred: failed calling webhook "mopenstackcontrolplane.kb.io": failed to call webhook: Post "https://openstack-operator-controller-manager-service.openstack-operators.svc:443/mutate-core-openstack-org-v1beta1-openstackcontrolplane?timeout=10s": no endpoints available for service "openstack-operator-controller-manager-service" You can run `oc replace -f /tmp/oc-edit-4063109714.yaml` to try this update again.
- links to
-
RHSA-2024:140345 RHOSO OpenStack Podified operator containers security update
- mentioned on