Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-45639

RTE pods fail to start due to selinux issues

XMLWordPrintable

    • Critical
    • Yes
    • 8
    • CNF Compute Sprint 263
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      If you plan to deploy the NUMA Resources Operator, avoid using OpenShift Container Platform versions 4.17.7 or 4.17.8. (link:https://issues.redhat.com/browse/OCPBUGS-45639[*OCPBUGS-45639*])
      Show
      If you plan to deploy the NUMA Resources Operator, avoid using OpenShift Container Platform versions 4.17.7 or 4.17.8. (link: https://issues.redhat.com/browse/OCPBUGS-45639 [* OCPBUGS-45639 *])
    • Known Issue
    • In Progress
    • This is due to an update in container-selinux-4-2.231.0-10.rhaos4.17.el9-noarch. Note this package is part of RHCOS but maintained by the containers team.

      Description of problem:

      The RTE pods gets stuck on CrashLoopBackOff due an selinux issue.
      This happens on the latest 4.17 and 4.16 OCP stable versions.
      This issue occurs because there were changes in the selinux permissions which was backported to these versions.
      

      How reproducible:

      Everytime

      Steps to Reproduce:

          1. Deploy NROP operator and when the RTE pods should come up after applying the CR it will get stuck on CrashLoopBackOff
      Actual results:
      {code:none}
          NAME                                               READY   STATUS             RESTARTS         AGE
      numaresources-controller-manager-5d898b65d-ww649   1/1     Running            0                85m
      numaresourcesoperator-worker-58qp5                 1/2     CrashLoopBackOff   19 (3m14s ago)   75m
      numaresourcesoperator-worker-tzkpb                 1/2     CrashLoopBackOff   19 (3m57s ago)   75m

      Expected results:

          [root@registry ~]# oc get pods -n openshift-numaresources
      NAME                                                READY   STATUS    RESTARTS   AGE
      numaresources-controller-manager-85949d9899-pbgxt   1/1     Running   0          33m
      numaresourcesoperator-worker-fj64c                  2/2     Running   0          18m
      numaresourcesoperator-worker-lbrzr                  2/2     Running   0          18m
      Additional info:
      
      root@ocp-edge42:~# oc get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.17.7    True        False         18h     Cluster version is 4.17.7
      root@ocp-edge42:~# 
      root@ocp-edge42:~# 
      root@ocp-edge42:~# 
      root@ocp-edge42:~# oc get nodes
      NAME                                                      STATUS   ROLES                  AGE   VERSION
      ocp417218724-ctlplane-0.libvirt.lab.eng.tlv2.redhat.com   Ready    control-plane,master   19h   v1.30.6
      ocp417218724-ctlplane-1.libvirt.lab.eng.tlv2.redhat.com   Ready    control-plane,master   19h   v1.30.6
      ocp417218724-ctlplane-2.libvirt.lab.eng.tlv2.redhat.com   Ready    control-plane,master   19h   v1.30.6
      ocp417218724-worker-0.libvirt.lab.eng.tlv2.redhat.com     Ready    worker,worker-cnf      18h   v1.30.6
      ocp417218724-worker-1.libvirt.lab.eng.tlv2.redhat.com     Ready    worker                 18h   v1.30.6
      root@ocp-edge42:~# 
      root@ocp-edge42:~# 
      root@ocp-edge42:~# 
      root@ocp-edge42:~# 
      root@ocp-edge42:~# oc get mcp
      NAME         CONFIG                                                 UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
      master       rendered-master-6b9fbc317f8de825eef09af7a774003e       True      False      False      3              3                   3                     0                      19h
      worker       rendered-worker-817fa8cae4a0a3db0152ddb0de02f24d       True      False      False      1              1                   1                     0                      19h
      worker-cnf   rendered-worker-cnf-95a97a3dbe86b321b74e626f22d5e4c8   True      False      False      1              1                   1                     0                      13h
      root@ocp-edge42:~# 
      root@ocp-edge42:~# 
      root@ocp-edge42:~# 
      root@ocp-edge42:~# oc get pods
      NAME                                                READY   STATUS             RESTARTS        AGE
      numaresources-controller-manager-86b794588f-6txw9   1/1     Running            0               39m
      numaresourcesoperator-worker-cnf-qf6lk              1/2     CrashLoopBackOff   11 (3m7s ago)   34m
      root@ocp-edge42:~# 
      root@ocp-edge42:~# 
      root@ocp-edge42:~# 
      root@ocp-edge42:~# oc logs pod/numaresourcesoperator-worker-cnf-qf6lk
      Defaulted container "resource-topology-exporter" out of: resource-topology-exporter, shared-pool-container
      I1205 10:03:29.618210       1 main.go:66] starting resource-topology-exporter 4.17.1 44f70579fcd67c1ebbd2aa338cebfc4712283874 go1.22.7 (Red Hat 1.22.7-1.el9_5) X:strictfipsruntime
      I1205 10:03:29.618553       1 main.go:307] using Topology Manager scope "container" from "conf" (conf=container) policy "single-numa-node" from "conf" (conf=single-numa-node)
      I1205 10:03:29.619236       1 client.go:43] creating a podresources client for endpoint "unix:///host-podresources/kubelet.sock"
      I1205 10:03:29.619255       1 client.go:104] endpoint "unix:///host-podresources/kubelet.sock" -> protocol="unix" path="/host-podresources/kubelet.sock"
      I1205 10:03:29.619719       1 client.go:48] created a podresources client for endpoint "unix:///host-podresources/kubelet.sock"
      I1205 10:03:29.619747       1 setup.go:90] metrics endpoint disabled
      I1205 10:03:29.619774       1 podexclude.go:99] > POD excludes:
      I1205 10:03:29.619791       1 resourcetopologyexporter.go:127] using given Topology Manager policy "single-numa-node" scope "container"
      I1205 10:03:29.619844       1 notification.go:123] added interval every 10s
      I1205 10:03:29.619875       1 resourcemonitor.go:153] resource monitor for "ocp417218724-worker-0.libvirt.lab.eng.tlv2.redhat.com" starting
      I1205 10:03:29.638778       1 resourcemonitor.go:175] tracking node resources
      F1205 10:03:29.639205       1 main.go:118] failed to execute: failed to initialize ResourceMonitor: error while updating node allocatable: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /host-podresources/kubelet.sock: connect: permission denied"
      root@ocp-edge42:~# 
      root@ocp-edge42:~# 
      root@ocp-edge42:~# oc get csv
      NAME                             DISPLAY                  VERSION   REPLACES                         PHASE
      numaresources-operator.v4.17.1   numaresources-operator   4.17.1    numaresources-operator.v4.17.0   Succeeded
      root@ocp-edge42:~# oc get sub
      NAME                                            PACKAGE                  SOURCE             CHANNEL
      openshift-numaresources-operator-subscription   numaresources-operator   redhat-operators   4.17
      

              rhn-support-jnovy Jindrich Novy
              rh-ee-rshemtov Roy Shemtov
              Roy Shemtov Roy Shemtov
              Mallapadi Niranjan
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated:
                Resolved: