Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.16, 4.17
Component/s: Containers
Labels:

Severity:
Critical
Regression:
Yes
Story Points:
8
Sprint:
CNF Compute Sprint 263
sprint_count:
1
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:

Hide
If you plan to deploy the NUMA Resources Operator, avoid using OpenShift Container Platform versions 4.17.7 or 4.17.8.
====
If you plan to deploy the NUMA Resources Operator, avoid using OpenShift Container Platform versions 4.17.7 or 4.17.8. (link:https://issues.redhat.com/browse/OCPBUGS-45639[*~~OCPBUGS-45639~~*])

Show
If you plan to deploy the NUMA Resources Operator, avoid using OpenShift Container Platform versions 4.17.7 or 4.17.8. ==== If you plan to deploy the NUMA Resources Operator, avoid using OpenShift Container Platform versions 4.17.7 or 4.17.8. (link: https://issues.redhat.com/browse/OCPBUGS-45639 [* OCPBUGS-45639 *])
Release Note Type:
Known Issue
Release Note Status:
Done
Latest Status Summary:
This is due to an update in container-selinux-4-2.231.0-10.rhaos4.17.el9-noarch. Note this package is part of RHCOS but maintained by the containers team.
RH Private Keywords:
Target Version:

4.17.z

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

The RTE pods gets stuck on CrashLoopBackOff due an selinux issue.
This happens on the latest 4.17 and 4.16 OCP stable versions.
This issue occurs because there were changes in the selinux permissions which was backported to these versions.

How reproducible:

Everytime

Steps to Reproduce:

    1. Deploy NROP operator and when the RTE pods should come up after applying the CR it will get stuck on CrashLoopBackOff

Actual results:
{code:none}
    NAME                                               READY   STATUS             RESTARTS         AGE
numaresources-controller-manager-5d898b65d-ww649   1/1     Running            0                85m
numaresourcesoperator-worker-58qp5                 1/2     CrashLoopBackOff   19 (3m14s ago)   75m
numaresourcesoperator-worker-tzkpb                 1/2     CrashLoopBackOff   19 (3m57s ago)   75m

Expected results:

    [root@registry ~]# oc get pods -n openshift-numaresources
NAME                                                READY   STATUS    RESTARTS   AGE
numaresources-controller-manager-85949d9899-pbgxt   1/1     Running   0          33m
numaresourcesoperator-worker-fj64c                  2/2     Running   0          18m
numaresourcesoperator-worker-lbrzr                  2/2     Running   0          18m

Additional info:

root@ocp-edge42:~# oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.17.7    True        False         18h     Cluster version is 4.17.7
root@ocp-edge42:~# 
root@ocp-edge42:~# 
root@ocp-edge42:~# 
root@ocp-edge42:~# oc get nodes
NAME                                                      STATUS   ROLES                  AGE   VERSION
ocp417218724-ctlplane-0.libvirt.lab.eng.tlv2.redhat.com   Ready    control-plane,master   19h   v1.30.6
ocp417218724-ctlplane-1.libvirt.lab.eng.tlv2.redhat.com   Ready    control-plane,master   19h   v1.30.6
ocp417218724-ctlplane-2.libvirt.lab.eng.tlv2.redhat.com   Ready    control-plane,master   19h   v1.30.6
ocp417218724-worker-0.libvirt.lab.eng.tlv2.redhat.com     Ready    worker,worker-cnf      18h   v1.30.6
ocp417218724-worker-1.libvirt.lab.eng.tlv2.redhat.com     Ready    worker                 18h   v1.30.6
root@ocp-edge42:~# 
root@ocp-edge42:~# 
root@ocp-edge42:~# 
root@ocp-edge42:~# 
root@ocp-edge42:~# oc get mcp
NAME         CONFIG                                                 UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master       rendered-master-6b9fbc317f8de825eef09af7a774003e       True      False      False      3              3                   3                     0                      19h
worker       rendered-worker-817fa8cae4a0a3db0152ddb0de02f24d       True      False      False      1              1                   1                     0                      19h
worker-cnf   rendered-worker-cnf-95a97a3dbe86b321b74e626f22d5e4c8   True      False      False      1              1                   1                     0                      13h
root@ocp-edge42:~# 
root@ocp-edge42:~# 
root@ocp-edge42:~# 
root@ocp-edge42:~# oc get pods
NAME                                                READY   STATUS             RESTARTS        AGE
numaresources-controller-manager-86b794588f-6txw9   1/1     Running            0               39m
numaresourcesoperator-worker-cnf-qf6lk              1/2     CrashLoopBackOff   11 (3m7s ago)   34m
root@ocp-edge42:~# 
root@ocp-edge42:~# 
root@ocp-edge42:~# 
root@ocp-edge42:~# oc logs pod/numaresourcesoperator-worker-cnf-qf6lk
Defaulted container "resource-topology-exporter" out of: resource-topology-exporter, shared-pool-container
I1205 10:03:29.618210       1 main.go:66] starting resource-topology-exporter 4.17.1 44f70579fcd67c1ebbd2aa338cebfc4712283874 go1.22.7 (Red Hat 1.22.7-1.el9_5) X:strictfipsruntime
I1205 10:03:29.618553       1 main.go:307] using Topology Manager scope "container" from "conf" (conf=container) policy "single-numa-node" from "conf" (conf=single-numa-node)
I1205 10:03:29.619236       1 client.go:43] creating a podresources client for endpoint "unix:///host-podresources/kubelet.sock"
I1205 10:03:29.619255       1 client.go:104] endpoint "unix:///host-podresources/kubelet.sock" -> protocol="unix" path="/host-podresources/kubelet.sock"
I1205 10:03:29.619719       1 client.go:48] created a podresources client for endpoint "unix:///host-podresources/kubelet.sock"
I1205 10:03:29.619747       1 setup.go:90] metrics endpoint disabled
I1205 10:03:29.619774       1 podexclude.go:99] > POD excludes:
I1205 10:03:29.619791       1 resourcetopologyexporter.go:127] using given Topology Manager policy "single-numa-node" scope "container"
I1205 10:03:29.619844       1 notification.go:123] added interval every 10s
I1205 10:03:29.619875       1 resourcemonitor.go:153] resource monitor for "ocp417218724-worker-0.libvirt.lab.eng.tlv2.redhat.com" starting
I1205 10:03:29.638778       1 resourcemonitor.go:175] tracking node resources
F1205 10:03:29.639205       1 main.go:118] failed to execute: failed to initialize ResourceMonitor: error while updating node allocatable: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /host-podresources/kubelet.sock: connect: permission denied"
root@ocp-edge42:~# 
root@ocp-edge42:~# 
root@ocp-edge42:~# oc get csv
NAME                             DISPLAY                  VERSION   REPLACES                         PHASE
numaresources-operator.v4.17.1   numaresources-operator   4.17.1    numaresources-operator.v4.17.0   Succeeded
root@ocp-edge42:~# oc get sub
NAME                                            PACKAGE                  SOURCE             CHANNEL
openshift-numaresources-operator-subscription   numaresources-operator   redhat-operators   4.17

is blocked by

COS-3078 Impact RTE pods fail to start due to selinux issues

Closed

is cloned by

OCPBUGS-45983 RTE pods fail to start due to selinux issues

Closed

links to

RHBA-2024:11010 OpenShift Container Platform 4.17.z bug fix update

Assignee:: Jindrich Novy

Reporter:: Roy Shemtov

QA Contact:: Roy Shemtov

Contributors:: Mallapadi Niranjan

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Created:: 2024/12/05 12:40 PM

Updated:: 2025/03/20 8:23 AM

Resolved:: 2024/12/19 12:42 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates