-
Bug
-
Resolution: Not a Bug
-
Major
-
None
-
4.18
-
None
-
CNF Compute Sprint 261, CNF Compute Sprint 262
-
2
-
Rejected
-
False
-
-
-
Known Issue
-
In Progress
-
Install, Release Notes
-
Customer Facing
-
-
Description of problem:
When installing NROP 4.17.z on 4.18 OCP the RTE pods gets stuck on CrashLoopBackOff
Version-Release number of selected component (if applicable):
Reproducible on 4.18
How reproducible:
Everytime
Steps to Reproduce:
1. Installing any NROP build of 4.17.z will reproduce the issue
Extra details:
[root@helix36 ~]# oc get pods -n openshift-numaresources NAME READY STATUS RESTARTS AGE numaresources-controller-manager-666bd7f95d-j7596 1/1 Running 0 8m54s numaresourcesoperator-worker-cnf-z7f4m 1/2 CrashLoopBackOff 5 (2m27s ago) 5m19s
Doing an oc describe on the affected pod
Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 48m default-scheduler Successfully assigned openshift-numaresources/numaresourcesoperator-worker-cnf-df5cw to ocp4183523817-worker-0.libvirt.lab.eng.tlv2.redhat.com Normal AddedInterface 48m multus Add eth0 [10.135.1.148/23] from ovn-kubernetes Normal Pulled 48m kubelet Container image "registry.redhat.io/openshift4/numaresources-rhel9-operator@sha256:e0d4722e0501ab8b3aad81e33539af17d65b949bf54579b272ee98fae58b8fbb" already present on machine Normal Created 48m kubelet Created container shared-pool-container Normal Started 48m kubelet Started container shared-pool-container Normal Pulled 46m (x5 over 48m) kubelet Container image "registry.redhat.io/openshift4/numaresources-rhel9-operator@sha256:e0d4722e0501ab8b3aad81e33539af17d65b949bf54579b272ee98fae58b8fbb" already present on machine Normal Created 46m (x5 over 48m) kubelet Created container resource-topology-exporter Normal Started 46m (x5 over 48m) kubelet Started container resource-topology-exporter Warning BackOff 3m17s (x208 over 48m) kubelet Back-off restarting failed container resource-topology-exporter in pod numaresourcesoperator-worker-cnf-df5cw_openshift-numaresources(3c2d9745-7514-4d2e-85f7-5dbb8a4e3df5)
Getting the logs of the pod:
[root@helix36 ~]# oc logs pod/numaresourcesoperator-worker-cnf-df5cw Defaulted container "resource-topology-exporter" out of: resource-topology-exporter, shared-pool-container I1023 09:23:33.824820 1 main.go:66] starting resource-topology-exporter 4.17.1 44f70579fcd67c1ebbd2aa338cebfc4712283874 go1.22.7 (Red Hat 1.22.7-1.el9_5) X:strictfipsruntime I1023 09:23:33.825128 1 main.go:307] using Topology Manager scope "container" from "conf" (conf=container) policy "single-numa-node" from "conf" (conf=single-numa-node) I1023 09:23:33.825566 1 client.go:43] creating a podresources client for endpoint "unix:///host-podresources/kubelet.sock" I1023 09:23:33.825581 1 client.go:104] endpoint "unix:///host-podresources/kubelet.sock" -> protocol="unix" path="/host-podresources/kubelet.sock" I1023 09:23:33.825923 1 client.go:48] created a podresources client for endpoint "unix:///host-podresources/kubelet.sock" I1023 09:23:33.825940 1 setup.go:90] metrics endpoint disabled I1023 09:23:33.825946 1 podexclude.go:99] > POD excludes: I1023 09:23:33.825954 1 resourcetopologyexporter.go:127] using given Topology Manager policy "single-numa-node" scope "container" I1023 09:23:33.825981 1 notification.go:123] added interval every 10s I1023 09:23:33.825997 1 resourcemonitor.go:153] resource monitor for "ocp4183523817-worker-0.libvirt.lab.eng.tlv2.redhat.com" starting I1023 09:23:33.847823 1 resourcemonitor.go:175] tracking node resources F1023 09:23:33.848264 1 main.go:118] failed to execute: failed to initialize ResourceMonitor: error while updating node allocatable: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /host-podresources/kubelet.sock: connect: permission denied"
Inside the affected node getting the audit.log logs:
[root@helix36 ~]# oc debug node/ocp4183523817-worker-0.libvirt.lab.eng.tlv2.redhat.com Starting pod/ocp4183523817-worker-0libvirtlabengtlv2redhatcom-debug-4dtzj ... To use host binaries, run `chroot /host` Pod IP: 192.168.122.79 If you don't see a command prompt, try pressing enter. sh-5.1# chroot /host sh-5.1# tail -n 500 /var/log/audit/audit.log | grep -i denied type=AVC msg=audit(1729673274.837:5117): avc: denied { write } for pid=390531 comm="resource-topolo" name="kubelet.sock" dev="vda4" ino=56629082 scontext=system_u:system_r:rte.process:s0 tcontext=system_u:object_r:kubelet_var_lib_t:s0 tclass=sock_file permissive=0 type=AVC msg=audit(1729673575.850:5150): avc: denied { write } for pid=393240 comm="resource-topolo" name="kubelet.sock" dev="vda4" ino=56629082 scontext=system_u:system_r:rte.process:s0 tcontext=system_u:object_r:kubelet_var_lib_t:s0 tclass=sock_file permissive=0 type=AVC msg=audit(1729673879.850:5177): avc: denied { write } for pid=395969 comm="resource-topolo" name="kubelet.sock" dev="vda4" ino=56629082 scontext=system_u:system_r:rte.process:s0 tcontext=system_u:object_r:kubelet_var_lib_t:s0 tclass=sock_file permissive=0 type=AVC msg=audit(1729674188.863:5218): avc: denied { write } for pid=399077 comm="resource-topolo" name="kubelet.sock" dev="vda4" ino=56629082 scontext=system_u:system_r:rte.process:s0 tcontext=system_u:object_r:kubelet_var_lib_t:s0 tclass=sock_file permissive=0 type=AVC msg=audit(1729674491.841:5251): avc: denied { write } for pid=401753 comm="resource-topolo" name="kubelet.sock" dev="vda4" ino=56629082 scontext=system_u:system_r:rte.process:s0 tcontext=system_u:object_r:kubelet_var_lib_t:s0 tclass=sock_file permissive=0 type=AVC msg=audit(1729674797.836:5276): avc: denied { write } for pid=404337 comm="resource-topolo" name="kubelet.sock" dev="vda4" ino=56629082 scontext=system_u:system_r:rte.process:s0 tcontext=system_u:object_r:kubelet_var_lib_t:s0 tclass=sock_file permissive=0 type=AVC msg=audit(1729675102.850:5317): avc: denied { write } for pid=407229 comm="resource-topolo" name="kubelet.sock" dev="vda4" ino=56629082 scontext=system_u:system_r:rte.process:s0 tcontext=system_u:object_r:kubelet_var_lib_t:s0 tclass=sock_file permissive=0 type=AVC msg=audit(1729675413.825:5344): avc: denied { write } for pid=409881 comm="resource-topolo" name="kubelet.sock" dev="vda4" ino=56629082 scontext=system_u:system_r:rte.process:s0 tcontext=system_u:object_r:kubelet_var_lib_t:s0 tclass=sock_file permissive=0
Actual results:
Expected results:
Additional info: