Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-42218

Failed to start VM after adding multiple vGPUs in web console

    • No

      Description of problem:

      Failed to start VM after adding multiple vGPUs in web console

      Version-Release number of selected component (if applicable):

      CNV4.16 iib build:734430

      How reproducible:

      100%

      Steps to Reproduce:

      1. Create VM from VM template successfully, check the VM is running. 
         Login to web console: Virtualization -> VirtualMachines -> Create -> From templates -> Red Hat Enterprise Linux 9 VM: VirtualMachine name: rhel9-uefi, Disk source: URL(Creates PVC), Image URL: http://dell-per740-36.lab.eng.pek2.redhat.com/libvirt-CI-resources/RHEL-9.4-x86_64-latest-ovmf.qcow2, Disk size: 12 GiB, click "Quick create VirtualMachine". 
      
      2. Add the vGPU by steps: Configuration -> Hardware devices -> GPU devices: Add 4 GPU devices: Device name: nvidia.com/GRID_T4-4Q
      
      3. Restart the VM, the VM is keep on starting status, hit the error:
      "server error. command SyncVMI failed: "LibvirtError(Code=67, Domain=20,
            Message=''unsupported configuration: Only one vgpu device can have ''ramfb'' enabled'')"'
      
      Check the VM yaml file:
      #oc get vm rhel9-uefi -o yaml > rhel9-uefi.yaml
                gpus:
                - deviceName: nvidia.com/GRID_T4-4Q
                  name: gpus-aquamarine-chinchilla-19
                - deviceName: nvidia.com/GRID_T4-4Q
                  name: gpus-jade-hamster-74
                - deviceName: nvidia.com/GRID_T4-4Q
                  name: gpus-yellow-guanaco-60
                - deviceName: nvidia.com/GRID_T4-4Q
                  name: gpus-white-stork-27
      
      When use multiple vGPUs, the yaml file should like this:
                gpus:
                 - deviceName: nvidia.com/GRID_T4-4Q
                   name: gpus-aquamarine-chinchilla-19
                 - deviceName: nvidia.com/GRID_T4-4Q
                   name: gpus-jade-hamster-74
                   virtualGPUOptions:
                     display:
                       ramFB:
                         enabled: false

      Actual results:

      In step3: Faild to start the VM after adding 4 vGPU devices

      Expected results:

      In Step3: Start VM successfully.

      Additional info:

      - VM yaml file: rhel9-uefi.yaml

            [CNV-42218] Failed to start VM after adding multiple vGPUs in web console

            Errata Tool added a comment -

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (OpenShift Virtualization 4.17.0 Images), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHEA-2024:8140

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (OpenShift Virtualization 4.17.0 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2024:8140

            cnv-qe jira added a comment -

            A build that contains the fix PR(s) for this bug is available, moving status from MODIFIED to ON_QA.
            The first build that contains https://github.com/kubevirt/kubevirt/pull/12053 is CNV v4.17.0.rhel9-60.

            cnv-qe jira added a comment - A build that contains the fix PR(s) for this bug is available, moving status from MODIFIED to ON_QA. The first build that contains https://github.com/kubevirt/kubevirt/pull/12053 is CNV v4.17.0.rhel9-60 .

            kbidarka@redhat.com How far do we need to backport this?

            Luboslav Pivarc added a comment - kbidarka@redhat.com How far do we need to backport this?

            A build that contains the fix PR(s) for this bug is available, moving status from MODIFIED to ON_QA.
            The first build that contains https://github.com/kubevirt/kubevirt/pull/12053 is CNV v4.99.0.rhel9-563.

            cnv-qe jira added a comment - A build that contains the fix PR(s) for this bug is available, moving status from MODIFIED to ON_QA. The first build that contains https://github.com/kubevirt/kubevirt/pull/12053 is CNV v4.99.0.rhel9-563 .

            cnv-qe jira added a comment -

            All linked PR(s) of this bug have been merged, moving status from POST to MODIFIED.
            https://github.com/kubevirt/kubevirt/pull/12053 merged at 2024-06-24 18:17:35

            cnv-qe jira added a comment - All linked PR(s) of this bug have been merged, moving status from POST to MODIFIED. https://github.com/kubevirt/kubevirt/pull/12053 merged at 2024-06-24 18:17:35

            CPaaS Service Account mentioned this issue in a merge request of cpaas-midstream / openshift-virtualization / kubevirt on branch cnv-4.99-rhel-9_upstream_3abe1c64f48a71cc93f8d3c2b6314899:

            Updated US source to: 5981256 Merge pull request #12053 from vladikr/fix_vgpu_display

            GitLab CEE Bot added a comment - CPaaS Service Account mentioned this issue in a merge request of cpaas-midstream / openshift-virtualization / kubevirt on branch cnv-4.99-rhel-9_ upstream _3abe1c64f48a71cc93f8d3c2b6314899 : Updated US source to: 5981256 Merge pull request #12053 from vladikr/fix_vgpu_display

            kbidarka@redhat.com 

            It's definitely a bug. We should configure ramfb only once. 

            Today we just enable ramfb by default for vGPUs. 

            Vladik Romanovsky added a comment - kbidarka@redhat.com   It's definitely a bug. We should configure ramfb only once.  Today we just enable ramfb by default for vGPUs. 

            Alex Williamson added a comment - - edited

            kbidarka@redhat.com Multiple vGPUs is a valid configuration, but multiple ramfbs is not.  See Bugzilla 2079760  libvirt disables this configuration.

            Alex Williamson added a comment - - edited kbidarka@redhat.com Multiple vGPUs is a valid configuration, but multiple ramfbs is not.  See Bugzilla 2079760   libvirt disables this configuration.

            Kedar Bidarkar added a comment - vromanso@redhat.com   cc

            rhn-support-mtessun  says, should be full core profiles.

            alwillia@redhat.com Can you please check the cofniguration above mentioned in the bug description, do you think this is a valid configuration?

            Kedar Bidarkar added a comment - rhn-support-mtessun   says, should be full core profiles. alwillia@redhat.com Can you please check the cofniguration above mentioned in the bug description, do you think this is a valid configuration?

              sgott@redhat.com Stuart Gott
              chhu@redhat.com Chenli Hu
              Vasiliy Sibirskiy Vasiliy Sibirskiy
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated:
                Resolved: