-
Bug
-
Resolution: Duplicate
-
Undefined
-
None
-
4.16.z
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
None
-
x86_64
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
numaresourceoperator pods are in CrashLoopBackOff state with next error message:
Defaulted container "resource-topology-exporter" out of: resource-topology-exporter, shared-pool-container I0925 07:28:32.448592 1 main.go:66] starting resource-topology-exporter 4.16.6 9692c3190f9868ef04d188a42ad8d0bac954e756 go1.22.12 (Red Hat 1.22.12-2.el9_5) X:strictfipsruntime I0925 07:28:32.448868 1 main.go:307] using Topology Manager scope "container" from "default" (conf=) policy "single-numa-node" from "conf" (conf=single-numa-node) I0925 07:28:32.449382 1 client.go:43] creating a podresources client for endpoint "unix:///host-podresources/kubelet.sock" I0925 07:28:32.449402 1 client.go:104] endpoint "unix:///host-podresources/kubelet.sock" -> protocol="unix" path="/host-podresources/kubelet.sock" I0925 07:28:32.449790 1 client.go:48] created a podresources client for endpoint "unix:///host-podresources/kubelet.sock" I0925 07:28:32.449812 1 setup.go:90] metrics endpoint disabled I0925 07:28:32.449822 1 podexclude.go:99] > POD excludes: I0925 07:28:32.449835 1 resourcetopologyexporter.go:127] using given Topology Manager policy "single-numa-node" scope "container" I0925 07:28:32.449887 1 notification.go:123] added interval every 10s I0925 07:28:32.449912 1 resourcemonitor.go:153] resource monitor for "ocp-node-8.spoke00.core.telcoqe.eng.rdu2.dc.redhat.com" starting I0925 07:28:32.615362 1 resourcemonitor.go:175] tracking node resources F0925 07:28:32.615893 1 main.go:118] failed to execute: failed to initialize ResourceMonitor: error while updating node allocatable: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /host-podresources/kubelet.sock: connect: permission denied"
Version-Release number of selected component (if applicable):
OCP: 4.16.48 numaresources-operator.v4.16.6
How reproducible:
1st occurrence so far
Steps to Reproduce:
1. Deploy baremetal dualstack cluster 2. Install and configure NROP 3. Check status of numaresourceoperator pod(s)
Actual results:
Pods are in CrashLoopBackOff state
Expected results:
Pods are running without any errors
Additional info:
- depends on
-
OCPBUGS-61789 RTE pods stuck in CrashLoopBackOff due to selinux context changes
-
- MODIFIED
-