-
Bug
-
Resolution: Unresolved
-
Critical
-
None
-
4.17.z, 4.16.z
-
None
Description of problem:
4.17.z and 4.16.z OCP versions is now expecting the pod resources socket context to be kubelet_var_lib_t instead of container_var_lib_t which causes the RTE pods that is deployed when installing the NROP operator to be stuck on CrashLoopBackOff
Version-Release number of selected component (if applicable):
How reproducible:
Always
Steps to Reproduce:
1. Deploy the NROP operator and when the RTE pods should come up after applying the NROP CR it will be stuck on CrashLoopBackOff
Actual results:
numaresources-controller-manager-6c74699cf7-7hkxq 1/1 Running 0 33m numaresourcesoperator-worker-f5dr6 1/2 CrashLoopBackOff 9 (29s ago) 21m numaresourcesoperator-worker-rc2gr 1/2 Error 9 (5m6s ago) 21m secondary-scheduler-65557fc7cd-cx7gl 1/1 Running 0 20m
Expected results:
Expected results is for the RTE pods (numaresources-worker pods) to be Running and 2/2 for each worker in this example it's two.
Additional info:
[root@ocp-edge41 ~]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.17.0-0.nightly-2025-09-15-035144 True False 116m Cluster version is 4.17.0-0.nightly-2025-09-15-035144 [root@ocp-edge41 ~]# oc get no,mcp NAME STATUS ROLES AGE VERSION node/ocp4171598887-ctlplane-0.libvirt.lab.eng.tlv2.redhat.com Ready control-plane,master 145m v1.30.14 node/ocp4171598887-ctlplane-1.libvirt.lab.eng.tlv2.redhat.com Ready control-plane,master 146m v1.30.14 node/ocp4171598887-ctlplane-2.libvirt.lab.eng.tlv2.redhat.com Ready control-plane,master 146m v1.30.14 node/ocp4171598887-worker-0.libvirt.lab.eng.tlv2.redhat.com Ready worker 128m v1.30.14 node/ocp4171598887-worker-1.libvirt.lab.eng.tlv2.redhat.com Ready worker 128m v1.30.14 NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE machineconfigpool.machineconfiguration.openshift.io/master rendered-master-4d364f16d856959b60b95cf92eaf905c True False False 3 3 3 0 143m machineconfigpool.machineconfiguration.openshift.io/worker rendered-worker-f5a9a81979766c79c554a745f2cfb72a True False False 2 2 2 0 143m [root@ocp-edge41 ~]# oc get pods NAME READY STATUS RESTARTS AGE numaresources-controller-manager-6c74699cf7-7hkxq 1/1 Running 0 80m numaresourcesoperator-worker-f5dr6 1/2 CrashLoopBackOff 18 (63s ago) 68m numaresourcesoperator-worker-rc2gr 1/2 CrashLoopBackOff 18 (36s ago) 68m secondary-scheduler-65557fc7cd-cx7gl 1/1 Running 0 67m [root@ocp-edge41 ~]# oc logs pod/numaresourcesoperator-worker-f5dr6 Defaulted container "resource-topology-exporter" out of: resource-topology-exporter, shared-pool-container I0916 14:19:12.878899 1 main.go:66] starting resource-topology-exporter 0.0.1-dev1 63405e44f go1.22.12 (Red Hat 1.22.12-3.el9_5) X:strictfipsruntime I0916 14:19:12.879138 1 main.go:307] using Topology Manager scope "container" from "conf" (conf=container) policy "single-numa-node" from "conf" (conf=single-numa-node) I0916 14:19:12.879684 1 client.go:43] creating a podresources client for endpoint "unix:///host-podresources/kubelet.sock" I0916 14:19:12.879696 1 client.go:104] endpoint "unix:///host-podresources/kubelet.sock" -> protocol="unix" path="/host-podresources/kubelet.sock" I0916 14:19:12.879978 1 client.go:48] created a podresources client for endpoint "unix:///host-podresources/kubelet.sock" I0916 14:19:12.879989 1 setup.go:90] metrics endpoint disabled I0916 14:19:12.879993 1 podexclude.go:99] > POD excludes: I0916 14:19:12.879999 1 resourcetopologyexporter.go:127] using given Topology Manager policy "single-numa-node" scope "container" I0916 14:19:12.880035 1 notification.go:123] added interval every 10s I0916 14:19:12.880055 1 resourcemonitor.go:153] resource monitor for "ocp4171598887-worker-1.libvirt.lab.eng.tlv2.redhat.com" starting I0916 14:19:12.896159 1 resourcemonitor.go:175] tracking node resources F0916 14:19:12.896586 1 main.go:118] failed to execute: failed to initialize ResourceMonitor: error while updating node allocatable: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /host-podresources/kubelet.sock: connect: permission denied"
- is depended on by
-
OCPBUGS-62225 numaresourcesoperator pods fail to start - Error while dialing: dial unix /host-podresources/kubelet.sock: connect: permission denied
-
- Closed
-
- is duplicated by
-
RUN-3308 Corrective Measure for OCPBUGS-45639: RTE pods fail to start due to selinux issues
-
- To Do
-
- relates to
-
OCPBUGS-45983 RTE pods fail to start due to selinux issues
-
- Closed
-
- links to