Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-8731

SIGSEGV error code in config-policy-controller pod on spoke deployed from hub with OCP 4.14.3, ACM 2.8.4

XMLWordPrintable

    • 2
    • False
    • None
    • False
    • GRC Sprint 2023-21
    • Important
    • -
    • No

      Description of problem:

      Attempting to deploy SNO spoke with OCP 4.14.3 and Telco DU profile fails from hub cluster running OCP 4.14.z and ACM 2.8.4

      config-policy-controller pod on spoke is in CrashLoopBackup status

      Seen in multiple test envs

      The pod issue does not occur if ACM 2.9 is on the hub instead

      Version-Release number of selected component (if applicable):

      OCP (hub and spoke) 4.14.3
      ACM 2.8.4-xx

      How reproducible:

      Always

      Steps to Reproduce:

      1. Install hub with OCP 4.14.3 and ACM 2.8.4
      2. Deploy spoke with OCP 4.14.3
      3. Check deployment status and check config-policy-controller pod and logs on spoke.

      Actual results:

      spoke does not finish deployment (policy violations) and config-policy-controller pod on spoke is in CrashLoopBackup status

      Expected results:

      spoke deploys successfully and config-policy-controller pod on spoke runs properly

      Additional info:

      [kni@registry.ran-vcl01 dgonyier]$ oc get pods -A  | grep -vE 'NAME|Comp|Runn'
      open-cluster-management-agent-addon                config-policy-controller-86dcccd679-z2xh5                     1/2     CrashLoopBackOff   41 (2m43s ago)   3h13m
      [kni@registry.ran-vcl01 dgonyier]$ oc logs -n open-cluster-management-agent-addon                config-policy-controller-86dcccd679-z2xh5     
      2023-11-21T21:52:24.349Z	info	setup	app/main.go:61	Using	{"OperatorVersion": "0.0.1", "GoVersion": "go1.20.10 X:strictfipsruntime", "GOOS": "linux", "GOARCH": "amd64"}
      2023-11-21T21:52:26.892Z	info	controller-runtime.metrics	metrics/listener.go:44	Metrics server is starting to listen	{"addr": "localhost:8383"}
      2023-11-21T21:52:26.894Z	info	setup	app/main.go:337	The managed cluster supports dry run API requests
      2023-11-21T21:52:26.895Z	info	setup	app/main.go:379	Periodically processing Configuration Policies	{"frequency": 10}
      2023-11-21T21:52:26.895Z	info	setup	app/main.go:400	Starting lease controller to report status
      2023-11-21T21:52:26.895Z	info	configuration-policy-controller	controllers/configurationpolicy_controller.go:171	Waiting for leader election before periodically evaluating configuration policies
      2023-11-21T21:52:26.897Z	info	setup	app/main.go:420	Starting managers
      2023-11-21T21:52:26.897Z	info	manager/internal.go:369	Starting server	{"path": "/metrics", "kind": "metrics", "addr": "127.0.0.1:8383"}
      2023-11-21T21:52:26.897Z	info	manager/internal.go:369	Starting server	{"kind": "health probe", "addr": "[::]:8081"}
      2023-11-21T21:52:26.897Z	info	controller/controller.go:186	Starting EventSource	{"controller": "NamespaceSelector", "controllerGroup": "", "controllerKind": "Namespace", "source": "kind source: *v1.Namespace"}
      2023-11-21T21:52:26.897Z	info	controller/controller.go:186	Starting EventSource	{"controller": "NamespaceSelector", "controllerGroup": "", "controllerKind": "Namespace", "source": "kind source: *v1.Namespace"}
      2023-11-21T21:52:26.897Z	info	controller/controller.go:194	Starting Controller	{"controller": "NamespaceSelector", "controllerGroup": "", "controllerKind": "Namespace"}
      2023-11-21T21:52:26.897Z	info	controller/controller.go:186	Starting EventSource	{"controller": "configuration-policy-controller", "controllerGroup": "policy.open-cluster-management.io", "controllerKind": "ConfigurationPolicy", "source": "kind source: *v1.ConfigurationPolicy"}
      2023-11-21T21:52:26.897Z	info	controller/controller.go:194	Starting Controller	{"controller": "configuration-policy-controller", "controllerGroup": "policy.open-cluster-management.io", "controllerKind": "ConfigurationPolicy"}
      2023-11-21T21:52:26.999Z	info	controller/controller.go:228	Starting workers	{"controller": "configuration-policy-controller", "controllerGroup": "policy.open-cluster-management.io", "controllerKind": "ConfigurationPolicy", "worker count": 1}
      2023-11-21T21:52:26.999Z	info	controller/controller.go:228	Starting workers	{"controller": "NamespaceSelector", "controllerGroup": "", "controllerKind": "Namespace", "worker count": 1}
      2023-11-21T21:52:31.220Z	info	configuration-policy-controller	controllers/configurationpolicy_controller.go:2618	Ignoring an update to the object status	{"policy": "cnfde4-group-du-sno-validator-du-policy-config-g6954", "name": "master", "namespace": "", "resource": "machineconfigpools", "key": "status"}
      2023-11-21T21:52:31.230Z	info	configuration-policy-controller	controllers/configurationpolicy_controller.go:1523	The object template does not specify a name. Setting the remediation action to inform.	{"policy": "cnfde4-group-du-sno-validator-du-policy-config-g6954", "index": 1, "objectNamespace": "openshift-sriov-network-operator", "oldRemediationAction": "enforce"}
      panic: runtime error: invalid memory address or nil pointer dereference
      [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x2380072]
      
      goroutine 974 [running]:
      k8s.io/apimachinery/pkg/apis/meta/v1/unstructured.(*Unstructured).GetUID(...)
      	/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery@v0.27.1/pkg/apis/meta/v1/unstructured/unstructured.go:270
      open-cluster-management.io/config-policy-controller/controllers.(*ConfigurationPolicyReconciler).alreadyEvaluated(0x2b0c4e8?, 0xc0003f7880?, 0x0)
      	/remote-source/app/controllers/configurationpolicy_controller.go:2781 +0xb2
      open-cluster-management.io/config-policy-controller/controllers.(*ConfigurationPolicyReconciler).handleSingleObj(0xdb26b0?, {0xc00017fe00, {{0xc000927dc0, 0x19}, {0xc000907b52, 0x2}, {0xc000cd0f00, 0x16}}, 0x0, {0xc0006922a0, ...}, ...}, ...)
      	/remote-source/app/controllers/configurationpolicy_controller.go:1748 +0x765
      open-cluster-management.io/config-policy-controller/controllers.(*ConfigurationPolicyReconciler).handleObjects(0xc00019fb00, 0xc0002d9ea0, {0xc00012d280, 0x20}, {{0xc000d38378, 0x15}, {0x0, 0x0}, {0xc00012d280, 0x20}, ...}, ...)
      	/remote-source/app/controllers/configurationpolicy_controller.go:1555 +0xf25
      open-cluster-management.io/config-policy-controller/controllers.(*ConfigurationPolicyReconciler).handleObjectTemplates(_, {{{0x23d53a2, 0x13}, {0xc000d07bc0, 0x24}}, {{0xc0000566c0, 0x34}, {0x0, 0x0}, {0xc00092cd56, ...}, ...}, ...})
      	/remote-source/app/controllers/configurationpolicy_controller.go:1159 +0x2c11
      open-cluster-management.io/config-policy-controller/controllers.(*ConfigurationPolicyReconciler).handlePolicyWorker(0x1d12960?, 0xc0007695a0?, 0xc000c66fb8?)
      	/remote-source/app/controllers/configurationpolicy_controller.go:301 +0x1e5
      created by open-cluster-management.io/config-policy-controller/controllers.(*ConfigurationPolicyReconciler).PeriodicallyExecConfigPolicies
      	/remote-source/app/controllers/configurationpolicy_controller.go:243 +0x76c
      
      
      [kni@registry.ran-vcl01 dgonyier]$ source acm-operator-bundle 
      "registry-proxy.engineering.redhat.com/rh-osbs/rhacm2-acm-operator-bundle:v2.8.4-8"
      
      [kni@registry.ran-vcl01 dgonyier]$ source talm-bundle-current 
      https://access.redhat.com/containers/#/registry.access.redhat.com/openshift4/topology-aware-lifecycle-manager-operator-bundle-container-rhel8/images/v4.14.1-6
      
      

              mprahl Matthew Prahl
              rhn-support-dgonyier Dwaine Gonyier
              Derek Ho Derek Ho
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: