Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.16.z
Component/s: Node / Numa aware Scheduling
Labels:
- telco
- telco-core

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important
Regression:
None
Architecture:

x86_64

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

RH Private Keywords:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

    numaresourceoperator pods are in CrashLoopBackOff state with next error message:

Defaulted container "resource-topology-exporter" out of: resource-topology-exporter, shared-pool-container
I0925 07:28:32.448592       1 main.go:66] starting resource-topology-exporter 4.16.6 9692c3190f9868ef04d188a42ad8d0bac954e756 go1.22.12 (Red Hat 1.22.12-2.el9_5) X:strictfipsruntime
I0925 07:28:32.448868       1 main.go:307] using Topology Manager scope "container" from "default" (conf=) policy "single-numa-node" from "conf" (conf=single-numa-node)
I0925 07:28:32.449382       1 client.go:43] creating a podresources client for endpoint "unix:///host-podresources/kubelet.sock"
I0925 07:28:32.449402       1 client.go:104] endpoint "unix:///host-podresources/kubelet.sock" -> protocol="unix" path="/host-podresources/kubelet.sock"
I0925 07:28:32.449790       1 client.go:48] created a podresources client for endpoint "unix:///host-podresources/kubelet.sock"
I0925 07:28:32.449812       1 setup.go:90] metrics endpoint disabled
I0925 07:28:32.449822       1 podexclude.go:99] > POD excludes:
I0925 07:28:32.449835       1 resourcetopologyexporter.go:127] using given Topology Manager policy "single-numa-node" scope "container"
I0925 07:28:32.449887       1 notification.go:123] added interval every 10s
I0925 07:28:32.449912       1 resourcemonitor.go:153] resource monitor for "ocp-node-8.spoke00.core.telcoqe.eng.rdu2.dc.redhat.com" starting
I0925 07:28:32.615362       1 resourcemonitor.go:175] tracking node resources
F0925 07:28:32.615893       1 main.go:118] failed to execute: failed to initialize ResourceMonitor: error while updating node allocatable: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial unix /host-podresources/kubelet.sock: connect: permission denied"

Version-Release number of selected component (if applicable):

OCP: 4.16.48
numaresources-operator.v4.16.6

How reproducible:

    1st occurrence so far

Steps to Reproduce:

    1. Deploy baremetal dualstack cluster
    2. Install and configure NROP
    3. Check status of numaresourceoperator pod(s)

Actual results:

    Pods are in CrashLoopBackOff state

Expected results:

    Pods are running without any errors

Additional info:

depends on

OCPBUGS-61789 RTE pods stuck in CrashLoopBackOff due to selinux context changes

MODIFIED

Assignee:: Shereen Haj

Reporter:: Yurii Prokulevych

Need Info From:: None

Contributors:: None

QA Contact:: Niranjan Mallapadi Raghavendra Rao

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2025/09/25 8:14 AM

Updated:: 2025/09/30 1:56 PM

Resolved:: 2025/09/30 1:56 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates