-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.15.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
nvidia-device-plugin-validator is failing with error " Failed to allocate device vector A " ``` $ oc -n nvidia-gpu-operator logs -l app=nvidia-device-plugin-validator -c plugin-validation Failed to allocate device vector A (error code CUDA driver version is insufficient for CUDA runtime version)! [Vector addition of 50000 elements] ```
Version-Release number of selected component (if applicable):
OCP 4.15.28 gpu-operator-certified.v24.6.2 nfd.4.15.0-202410010035
How reproducible:
Easily by the customer.
Steps to Reproduce:
1. Deploy GPU operator from OperatorHub v24.6.2 2. Deploy clusterPolicy with default settings 3. oc get pods -n nvidia-gpu-operator 4.
Actual results:
$ oc -n nvidia-gpu-operator logs -l app=nvidia-device-plugin-validator -c plugin-validation Failed to allocate device vector A (error code CUDA driver version is insufficient for CUDA runtime version)! [Vector addition of 50000 elements]
Expected results:
nvidia-device-plugin-validator should run fine.
Additional info: