Loading...

Linking RHIVOS CVEs to...

Migration: Automation ...

Sync from "Extern...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: rhel-9.6
Component/s: container-selinux
Labels:
None

Regression:
None
Severity:
Moderate

AssignedTeam:
rhel-container-tools

Story Points:
3
Blocked:
False
Ready:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
None
Sprint:
None

Preliminary Testing:
None
Test Coverage:
None

ProdDocsReview-CCS:
Unspecified
ProdDocsReview-Dev:
Unspecified
ProdDocsReview-QE:
Unspecified

Architecture:

x86_64

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

What were you trying to do that didn't work?

We are trying to Install GPU Operator with cdi disabled on a Kubernetes node with RHEL 9 installed.

We are on a RHEL 9.6 system with the kernel version "5.14.0-570.12.1.el9_6.x86_64". We are using Kubernetes and CRIO version v1.33. We have started observing these failures since upgrading the kernels to the newer versions, so we believe this is a regression.

The last working RHEL 9.6 kernel version was "5.14.0-570.12.1.el9_6.x86_64". We have also observed that changing the versions of kubernetes, crio, crun and container-selinux did not seem to have any affect. The kernel version seems to be the most significant factor here

What is the impact of this issue to you?

The GPU Operator in non-CDI mode is unusable since upgrading the RHEL 9.6 kernels. This will affect all users of the GPU Operator on RHEL 9.

Please provide the package NVR for which the bug is seen:

How reproducible is this bug?:

Steps to reproduce

Set up a Kubernetes Cluster
Install Helm
Install GPU Operator with CDI disabled
( append "–set cdi.disabled=false" to the helm install command referenced in the link)

Expected results

All of the GPU Operator pods come up with no issues

Actual results

The nvidia-operator-validator pod fails to come up and its status is reported as "Init:CreateContainerError". The toolkit-validation container goes into CrashLoopBackoff with the following error

time="2026-02-11T21:58:57Z" level=info msg="version: 84601875-amd64, commit: 8460187"                                                                                                                                                           
toolkit is not ready                                                                                                                                                                                                                            
time="2026-02-11T21:58:57Z" level=info msg="Error: error validating toolkit installation: exec: \"nvidia-smi\": executable file not found in $PATH"

I have also attached the strace logs which provide more details on the exact failure

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

nvidia-container-cli-strace.log
121 kB
2026/02/11 11:48 PM

Assignee:: Peter Hunt

Reporter:: Tariq Ibrahim

Developer:: Container Runtime Eng Bot

QA Contact:: Container Runtime Bugs Bot

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2026/02/11 11:42 PM

Updated:: 2026/02/24 12:20 PM

Stale Date:: 2027/02/23

Details

Description

What were you trying to do that didn't work?

Please provide the package NVR for which the bug is seen:

How reproducible is this bug?:

Steps to reproduce

Expected results

Actual results

Attachments

Attachments

Easy Agile Planning Poker

Activity

People

Dates