-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
rhel-9.6
-
None
-
None
-
Moderate
-
rhel-container-tools
-
3
-
False
-
False
-
-
None
-
None
-
None
-
None
-
Unspecified
-
Unspecified
-
Unspecified
-
x86_64
-
None
What were you trying to do that didn't work?
We are trying to Install GPU Operator with cdi disabled on a Kubernetes node with RHEL 9 installed.
We are on a RHEL 9.6 system with the kernel version "5.14.0-570.12.1.el9_6.x86_64". We are using Kubernetes and CRIO version v1.33. We have started observing these failures since upgrading the kernels to the newer versions, so we believe this is a regression.
The last working RHEL 9.6 kernel version was "5.14.0-570.12.1.el9_6.x86_64". We have also observed that changing the versions of kubernetes, crio, crun and container-selinux did not seem to have any affect. The kernel version seems to be the most significant factor here
What is the impact of this issue to you?
The GPU Operator in non-CDI mode is unusable since upgrading the RHEL 9.6 kernels. This will affect all users of the GPU Operator on RHEL 9.
Please provide the package NVR for which the bug is seen:
How reproducible is this bug?:
Steps to reproduce
- Set up a Kubernetes Cluster
- Install Helm
- Install GPU Operator with CDI disabled
( append "–set cdi.disabled=false" to the helm install command referenced in the link)
Expected results
All of the GPU Operator pods come up with no issues
Actual results
The nvidia-operator-validator pod fails to come up and its status is reported as "Init:CreateContainerError". The toolkit-validation container goes into CrashLoopBackoff with the following error
time="2026-02-11T21:58:57Z" level=info msg="version: 84601875-amd64, commit: 8460187" toolkit is not ready time="2026-02-11T21:58:57Z" level=info msg="Error: error validating toolkit installation: exec: \"nvidia-smi\": executable file not found in $PATH"
I have also attached the strace logs which provide more details on the exact failure