Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.17.z, 4.16.z
Component/s: Containers
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
2
Severity:
Critical
Regression:
Yes

Target Backport Versions:

4.17.z, 4.16.z
Target Version:
None
Release Blocker:
None
Sprint:
None

RH Private Keywords:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

4.17.z and 4.16.z OCP versions
 is now expecting the pod resources socket context to be kubelet_var_lib_t instead of container_var_lib_t which causes the RTE pods that is deployed when installing the NROP operator to be stuck on CrashLoopBackOff

Version-Release number of selected component (if applicable):

How reproducible:

Always

Steps to Reproduce:

    1. Deploy the NROP operator and when the RTE pods should come up after applying the NROP CR it will be stuck on CrashLoopBackOff

Actual results:

numaresources-controller-manager-6c74699cf7-7hkxq   1/1     Running            0              33m
numaresourcesoperator-worker-f5dr6                  1/2     CrashLoopBackOff   9 (29s ago)    21m
numaresourcesoperator-worker-rc2gr                  1/2     Error              9 (5m6s ago)   21m
secondary-scheduler-65557fc7cd-cx7gl                1/1     Running            0              20m

Expected results:

Expected results is for the RTE pods (numaresources-worker pods) to be Running and 2/2 for each worker in this example it's two.

Additional info:

[root@ocp-edge41 ~]# oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.17.0-0.nightly-2025-09-15-035144   True        False         116m    Cluster version is 4.17.0-0.nightly-2025-09-15-035144
[root@ocp-edge41 ~]# oc get no,mcp
NAME                                                            STATUS   ROLES                  AGE    VERSION
node/ocp4171598887-ctlplane-0.libvirt.lab.eng.tlv2.redhat.com   Ready    control-plane,master   145m   v1.30.14
node/ocp4171598887-ctlplane-1.libvirt.lab.eng.tlv2.redhat.com   Ready    control-plane,master   146m   v1.30.14
node/ocp4171598887-ctlplane-2.libvirt.lab.eng.tlv2.redhat.com   Ready    control-plane,master   146m   v1.30.14
node/ocp4171598887-worker-0.libvirt.lab.eng.tlv2.redhat.com     Ready    worker                 128m   v1.30.14
node/ocp4171598887-worker-1.libvirt.lab.eng.tlv2.redhat.com     Ready    worker                 128m   v1.30.14

NAME                                                         CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
machineconfigpool.machineconfiguration.openshift.io/master   rendered-master-4d364f16d856959b60b95cf92eaf905c   True      False      False      3              3                   3                     0                      143m
machineconfigpool.machineconfiguration.openshift.io/worker   rendered-worker-f5a9a81979766c79c554a745f2cfb72a   True      False      False      2              2                   2                     0                      143m
[root@ocp-edge41 ~]# oc get pods
NAME                                                READY   STATUS             RESTARTS       AGE
numaresources-controller-manager-6c74699cf7-7hkxq   1/1     Running            0              80m
numaresourcesoperator-worker-f5dr6                  1/2     CrashLoopBackOff   18 (63s ago)   68m
numaresourcesoperator-worker-rc2gr                  1/2     CrashLoopBackOff   18 (36s ago)   68m
secondary-scheduler-65557fc7cd-cx7gl                1/1     Running            0              67m
[root@ocp-edge41 ~]# oc logs pod/numaresourcesoperator-worker-f5dr6
Defaulted container "resource-topology-exporter" out of: resource-topology-exporter, shared-pool-container
I0916 14:19:12.878899       1 main.go:66] starting resource-topology-exporter 0.0.1-dev1 63405e44f go1.22.12 (Red Hat 1.22.12-3.el9_5) X:strictfipsruntime
I0916 14:19:12.879138       1 main.go:307] using Topology Manager scope "container" from "conf" (conf=container) policy "single-numa-node" from "conf" (conf=single-numa-node)
I0916 14:19:12.879684       1 client.go:43] creating a podresources client for endpoint "unix:///host-podresources/kubelet.sock"
I0916 14:19:12.879696       1 client.go:104] endpoint "unix:///host-podresources/kubelet.sock" -> protocol="unix" path="/host-podresources/kubelet.sock"
I0916 14:19:12.879978       1 client.go:48] created a podresources client for endpoint "unix:///host-podresources/kubelet.sock"
I0916 14:19:12.879989       1 setup.go:90] metrics endpoint disabled
I0916 14:19:12.879993       1 podexclude.go:99] > POD excludes:
I0916 14:19:12.879999       1 resourcetopologyexporter.go:127] using given Topology Manager policy "single-numa-node" scope "container"
I0916 14:19:12.880035       1 notification.go:123] added interval every 10s
I0916 14:19:12.880055       1 resourcemonitor.go:153] resource monitor for "ocp4171598887-worker-1.libvirt.lab.eng.tlv2.redhat.com" starting
I0916 14:19:12.896159       1 resourcemonitor.go:175] tracking node resources
F0916 14:19:12.896586       1 main.go:118] failed to execute: failed to initialize ResourceMonitor: error while updating node allocatable: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /host-podresources/kubelet.sock: connect: permission denied"

is depended on by

OCPBUGS-62225 numaresourcesoperator pods fail to start - Error while dialing: dial unix /host-podresources/kubelet.sock: connect: permission denied

Closed

is duplicated by

RUN-3308 Corrective Measure for OCPBUGS-45639: RTE pods fail to start due to selinux issues

To Do

relates to

OCPBUGS-45983 RTE pods fail to start due to selinux issues

Closed

links to

Previous release notes on same issue from OCPBUGS-45639

Assignee:: Jindrich Novy

Reporter:: Roy Shemtov

Need Info From:: None

Contributors:: None

QA Contact:: Roy Shemtov

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2025/09/16 2:15 PM

Updated:: 2025/11/24 7:09 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide