-
Bug
-
Resolution: Done-Errata
-
Critical
-
None
-
4.16, 4.17
-
Critical
-
Yes
-
8
-
CNF Compute Sprint 263
-
1
-
False
-
-
-
Known Issue
-
In Progress
-
This is due to an update in container-selinux-4-2.231.0-10.rhaos4.17.el9-noarch. Note this package is part of RHCOS but maintained by the containers team.
-
Description of problem:
The RTE pods gets stuck on CrashLoopBackOff due an selinux issue. This happens on the latest 4.17 and 4.16 OCP stable versions. This issue occurs because there were changes in the selinux permissions which was backported to these versions.
How reproducible:
Everytime
Steps to Reproduce:
1. Deploy NROP operator and when the RTE pods should come up after applying the CR it will get stuck on CrashLoopBackOff
Actual results: {code:none} NAME READY STATUS RESTARTS AGE numaresources-controller-manager-5d898b65d-ww649 1/1 Running 0 85m numaresourcesoperator-worker-58qp5 1/2 CrashLoopBackOff 19 (3m14s ago) 75m numaresourcesoperator-worker-tzkpb 1/2 CrashLoopBackOff 19 (3m57s ago) 75m
Expected results:
[root@registry ~]# oc get pods -n openshift-numaresources NAME READY STATUS RESTARTS AGE numaresources-controller-manager-85949d9899-pbgxt 1/1 Running 0 33m numaresourcesoperator-worker-fj64c 2/2 Running 0 18m numaresourcesoperator-worker-lbrzr 2/2 Running 0 18m
Additional info: root@ocp-edge42:~# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.17.7 True False 18h Cluster version is 4.17.7 root@ocp-edge42:~# root@ocp-edge42:~# root@ocp-edge42:~# root@ocp-edge42:~# oc get nodes NAME STATUS ROLES AGE VERSION ocp417218724-ctlplane-0.libvirt.lab.eng.tlv2.redhat.com Ready control-plane,master 19h v1.30.6 ocp417218724-ctlplane-1.libvirt.lab.eng.tlv2.redhat.com Ready control-plane,master 19h v1.30.6 ocp417218724-ctlplane-2.libvirt.lab.eng.tlv2.redhat.com Ready control-plane,master 19h v1.30.6 ocp417218724-worker-0.libvirt.lab.eng.tlv2.redhat.com Ready worker,worker-cnf 18h v1.30.6 ocp417218724-worker-1.libvirt.lab.eng.tlv2.redhat.com Ready worker 18h v1.30.6 root@ocp-edge42:~# root@ocp-edge42:~# root@ocp-edge42:~# root@ocp-edge42:~# root@ocp-edge42:~# oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-6b9fbc317f8de825eef09af7a774003e True False False 3 3 3 0 19h worker rendered-worker-817fa8cae4a0a3db0152ddb0de02f24d True False False 1 1 1 0 19h worker-cnf rendered-worker-cnf-95a97a3dbe86b321b74e626f22d5e4c8 True False False 1 1 1 0 13h root@ocp-edge42:~# root@ocp-edge42:~# root@ocp-edge42:~# root@ocp-edge42:~# oc get pods NAME READY STATUS RESTARTS AGE numaresources-controller-manager-86b794588f-6txw9 1/1 Running 0 39m numaresourcesoperator-worker-cnf-qf6lk 1/2 CrashLoopBackOff 11 (3m7s ago) 34m root@ocp-edge42:~# root@ocp-edge42:~# root@ocp-edge42:~# root@ocp-edge42:~# oc logs pod/numaresourcesoperator-worker-cnf-qf6lk Defaulted container "resource-topology-exporter" out of: resource-topology-exporter, shared-pool-container I1205 10:03:29.618210 1 main.go:66] starting resource-topology-exporter 4.17.1 44f70579fcd67c1ebbd2aa338cebfc4712283874 go1.22.7 (Red Hat 1.22.7-1.el9_5) X:strictfipsruntime I1205 10:03:29.618553 1 main.go:307] using Topology Manager scope "container" from "conf" (conf=container) policy "single-numa-node" from "conf" (conf=single-numa-node) I1205 10:03:29.619236 1 client.go:43] creating a podresources client for endpoint "unix:///host-podresources/kubelet.sock" I1205 10:03:29.619255 1 client.go:104] endpoint "unix:///host-podresources/kubelet.sock" -> protocol="unix" path="/host-podresources/kubelet.sock" I1205 10:03:29.619719 1 client.go:48] created a podresources client for endpoint "unix:///host-podresources/kubelet.sock" I1205 10:03:29.619747 1 setup.go:90] metrics endpoint disabled I1205 10:03:29.619774 1 podexclude.go:99] > POD excludes: I1205 10:03:29.619791 1 resourcetopologyexporter.go:127] using given Topology Manager policy "single-numa-node" scope "container" I1205 10:03:29.619844 1 notification.go:123] added interval every 10s I1205 10:03:29.619875 1 resourcemonitor.go:153] resource monitor for "ocp417218724-worker-0.libvirt.lab.eng.tlv2.redhat.com" starting I1205 10:03:29.638778 1 resourcemonitor.go:175] tracking node resources F1205 10:03:29.639205 1 main.go:118] failed to execute: failed to initialize ResourceMonitor: error while updating node allocatable: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /host-podresources/kubelet.sock: connect: permission denied" root@ocp-edge42:~# root@ocp-edge42:~# root@ocp-edge42:~# oc get csv NAME DISPLAY VERSION REPLACES PHASE numaresources-operator.v4.17.1 numaresources-operator 4.17.1 numaresources-operator.v4.17.0 Succeeded root@ocp-edge42:~# oc get sub NAME PACKAGE SOURCE CHANNEL openshift-numaresources-operator-subscription numaresources-operator redhat-operators 4.17
- is blocked by
-
COS-3078 Impact RTE pods fail to start due to selinux issues
- Closed
- is cloned by
-
OCPBUGS-45983 RTE pods fail to start due to selinux issues
- Closed
- links to
-
RHBA-2024:11010 OpenShift Container Platform 4.17.z bug fix update