-
Bug
-
Resolution: Not a Bug
-
Normal
-
None
-
4.13.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Due to permission issues connecting to kubelet.sock , RTE Pods fail to start.
Version-Release number of selected component (if applicable):
4.13.22
How reproducible:
Everytime
Steps to Reproduce:
1. Deploy NROP 2. Deploy RTE pods 3.
Actual results:
[root@helix36 ~]# oc get pods NAME READY STATUS RESTARTS AGE numaresources-controller-manager-9cfc4f89-r4xgz 1/1 Running 0 6h42m numaresourcesoperator-worker-cnf-wchfz 1/2 CrashLoopBackOff 82 (3m35s ago) 6h38m
Expected results:
RTE pods should start
Additional info:
[root@helix36 ~]# oc logs pods/numaresourcesoperator-worker-cnf-wchfz Defaulted container "resource-topology-exporter" out of: resource-topology-exporter, shared-pool-container I0317 15:13:43.650231 1 main.go:65] starting resource-topology-exporter 4.13.8 3f8156393a74dd88b1c380a01d8d76f361fc4187 go1.19.13 I0317 15:13:43.650554 1 main.go:294] using Topology Manager scope "container" from "conf" (conf=container) policy "single-numa-node" from "conf" (conf=single-numa-node) I0317 15:13:43.651258 1 client.go:43] creating a podresources client for endpoint "unix:///host-podresources/kubelet.sock" I0317 15:13:43.651279 1 client.go:104] endpoint "unix:///host-podresources/kubelet.sock" -> protocol="unix" path="/host-podresources/kubelet.sock" I0317 15:13:43.651693 1 client.go:48] created a podresources client for endpoint "unix:///host-podresources/kubelet.sock" I0317 15:13:43.651744 1 prometheus.go:113] prometheus endpoint disabled I0317 15:13:43.651756 1 podexclude.go:87] > POD excludes: I0317 15:13:43.651783 1 resourcetopologyexporter.go:127] using given Topology Manager policy "single-numa-node" scope "container" I0317 15:13:43.651886 1 notification.go:129] added interval every 10s I0317 15:13:43.651938 1 resourcemonitor.go:153] resource monitor for "ocp4132201838-worker-0.libvirt.lab.eng.tlv2.redhat.com" starting I0317 15:13:43.682265 1 resourcemonitor.go:175] tracking node resources F0317 15:13:43.683140 1 main.go:112] failed to execute: failed to initialize ResourceMonitor: error while updating node allocatable: rpc error: code = Unavailable desc = connection err or: desc = "transport: Error while dialing: dial unix /host-podresources/kubelet.sock: connect: permission denied"