Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-43739

nvidia-device-plugin-validator is failing with error " Failed to allocate device vector A "

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • 4.15.z
    • NVIDIA
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      nvidia-device-plugin-validator is failing with error " Failed to allocate device vector A "
      
      ```
      $ oc -n nvidia-gpu-operator logs -l app=nvidia-device-plugin-validator -c plugin-validation
      Failed to allocate device vector A (error code CUDA driver version is insufficient for CUDA runtime version)!
      [Vector addition of 50000 elements]
      ```    

      Version-Release number of selected component (if applicable):

      OCP 4.15.28
      gpu-operator-certified.v24.6.2
      nfd.4.15.0-202410010035

      How reproducible:

          Easily by the customer.

      Steps to Reproduce:

          1. Deploy GPU operator from OperatorHub v24.6.2
          2. Deploy clusterPolicy with default settings
          3. oc get pods -n nvidia-gpu-operator
          4. 
      
          

      Actual results:

      $ oc -n nvidia-gpu-operator logs -l app=nvidia-device-plugin-validator -c plugin-validation
      Failed to allocate device vector A (error code CUDA driver version is insufficient for CUDA runtime version)!
      [Vector addition of 50000 elements]
      

      Expected results:

          nvidia-device-plugin-validator  should run fine.

      Additional info:

          

              fdupont@redhat.com Fabien Dupont
              rhn-support-kdsouza Kenneth Dsouza
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: