[CNV-42218] Failed to start VM after adding multiple vGPUs in web console

Type: Bug
Resolution: Done-Errata
Priority: Major
Fix Version/s: CNV v4.17.0
Affects Version/s: CNV v4.16.0
Component/s: CNV Virtualization
Labels:
- Libvirt_CNV_INT

Story Points:
0.42
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Component Fix Version(s):
CNV v4.17.0.rhel9-60
Git Pull Request:
https://github.com/kubevirt/kubevirt/pull/12053
[QE] How to address?:
---
[QE] Why QE missed?:
---
Market:

Regression:
No

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

Failed to start VM after adding multiple vGPUs in web console

Version-Release number of selected component (if applicable):

CNV4.16 iib build:734430

How reproducible:

100%

Steps to Reproduce:

1. Create VM from VM template successfully, check the VM is running. 
   Login to web console: Virtualization -> VirtualMachines -> Create -> From templates -> Red Hat Enterprise Linux 9 VM: VirtualMachine name: rhel9-uefi, Disk source: URL(Creates PVC), Image URL: http://dell-per740-36.lab.eng.pek2.redhat.com/libvirt-CI-resources/RHEL-9.4-x86_64-latest-ovmf.qcow2, Disk size: 12 GiB, click "Quick create VirtualMachine". 

2. Add the vGPU by steps: Configuration -> Hardware devices -> GPU devices: Add 4 GPU devices: Device name: nvidia.com/GRID_T4-4Q

3. Restart the VM, the VM is keep on starting status, hit the error:
"server error. command SyncVMI failed: "LibvirtError(Code=67, Domain=20,
      Message=''unsupported configuration: Only one vgpu device can have ''ramfb'' enabled'')"'

Check the VM yaml file:
#oc get vm rhel9-uefi -o yaml > rhel9-uefi.yaml
          gpus:
          - deviceName: nvidia.com/GRID_T4-4Q
            name: gpus-aquamarine-chinchilla-19
          - deviceName: nvidia.com/GRID_T4-4Q
            name: gpus-jade-hamster-74
          - deviceName: nvidia.com/GRID_T4-4Q
            name: gpus-yellow-guanaco-60
          - deviceName: nvidia.com/GRID_T4-4Q
            name: gpus-white-stork-27

When use multiple vGPUs, the yaml file should like this:
          gpus:
           - deviceName: nvidia.com/GRID_T4-4Q
             name: gpus-aquamarine-chinchilla-19
           - deviceName: nvidia.com/GRID_T4-4Q
             name: gpus-jade-hamster-74
             virtualGPUOptions:
               display:
                 ramFB:
                   enabled: false

Actual results:

In step3: Faild to start the VM after adding 4 vGPU devices

Expected results:

In Step3: Start VM successfully.

Additional info:

- VM yaml file: rhel9-uefi.yaml

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

image-2024-10-09-10-40-35-034.png
164 kB
2024/10/09 2:40 AM
rhel9-uefi.yaml
5 kB
2024/05/27 11:25 AM

links to

RHEA-2024:133097 OpenShift Virtualization 4.17.0 Images

mentioned on

Merge request - Updated US source to: 5981256 Merge pull request #12053 from vladikr/fix_vgpu_display

Errata Tool added a comment - 2024/10/15 2:11 PM

Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

For information on the advisory (OpenShift Virtualization 4.17.0 Images), and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHEA-2024:8140

Errata Tool added a comment - 2024/10/15 2:11 PM Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (OpenShift Virtualization 4.17.0 Images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2024:8140

cnv-qe jira added a comment - 2024/07/04 9:03 AM

A build that contains the fix PR(s) for this bug is available, moving status from MODIFIED to ON_QA.
The first build that contains https://github.com/kubevirt/kubevirt/pull/12053 is CNV v4.17.0.rhel9-60.

cnv-qe jira added a comment - 2024/07/04 9:03 AM A build that contains the fix PR(s) for this bug is available, moving status from MODIFIED to ON_QA. The first build that contains https://github.com/kubevirt/kubevirt/pull/12053 is CNV v4.17.0.rhel9-60 .

Luboslav Pivarc added a comment - 2024/06/26 2:49 PM

kbidarka@redhat.com How far do we need to backport this?

Luboslav Pivarc added a comment - 2024/06/26 2:49 PM kbidarka@redhat.com How far do we need to backport this?

cnv-qe jira added a comment - 2024/06/24 11:02 PM

cnv-qe jira added a comment - 2024/06/24 11:02 PM A build that contains the fix PR(s) for this bug is available, moving status from MODIFIED to ON_QA. The first build that contains https://github.com/kubevirt/kubevirt/pull/12053 is CNV v4.99.0.rhel9-563 .

cnv-qe jira added a comment - 2024/06/24 8:02 PM

All linked PR(s) of this bug have been merged, moving status from POST to MODIFIED.
https://github.com/kubevirt/kubevirt/pull/12053 merged at 2024-06-24 18:17:35

cnv-qe jira added a comment - 2024/06/24 8:02 PM All linked PR(s) of this bug have been merged, moving status from POST to MODIFIED. https://github.com/kubevirt/kubevirt/pull/12053 merged at 2024-06-24 18:17:35

GitLab CEE Bot added a comment - 2024/06/24 7:29 PM

CPaaS Service Account mentioned this issue in a merge request of cpaas-midstream / openshift-virtualization / kubevirt on branch cnv-4.99-rhel-9_upstream_3abe1c64f48a71cc93f8d3c2b6314899:

Updated US source to: 5981256 Merge pull request #12053 from vladikr/fix_vgpu_display

GitLab CEE Bot added a comment - 2024/06/24 7:29 PM CPaaS Service Account mentioned this issue in a merge request of cpaas-midstream / openshift-virtualization / kubevirt on branch cnv-4.99-rhel-9_ upstream _3abe1c64f48a71cc93f8d3c2b6314899 : Updated US source to: 5981256 Merge pull request #12053 from vladikr/fix_vgpu_display

Vladik Romanovsky added a comment - 2024/05/31 2:50 PM

kbidarka@redhat.com

It's definitely a bug. We should configure ramfb only once.

Today we just enable ramfb by default for vGPUs.

Vladik Romanovsky added a comment - 2024/05/31 2:50 PM kbidarka@redhat.com It's definitely a bug. We should configure ramfb only once. Today we just enable ramfb by default for vGPUs.

Alex Williamson added a comment - 2024/05/30 7:58 PM - edited

kbidarka@redhat.com Multiple vGPUs is a valid configuration, but multiple ramfbs is not. See Bugzilla 2079760 libvirt disables this configuration.

Alex Williamson added a comment - 2024/05/30 7:58 PM - edited kbidarka@redhat.com Multiple vGPUs is a valid configuration, but multiple ramfbs is not. See Bugzilla 2079760 libvirt disables this configuration.

Kedar Bidarkar added a comment - 2024/05/29 12:19 PM

vromanso@redhat.com cc

Kedar Bidarkar added a comment - 2024/05/29 12:19 PM vromanso@redhat.com cc

Kedar Bidarkar added a comment - 2024/05/29 12:18 PM

rhn-support-mtessun says, should be full core profiles.

alwillia@redhat.com Can you please check the cofniguration above mentioned in the bug description, do you think this is a valid configuration?

Kedar Bidarkar added a comment - 2024/05/29 12:18 PM rhn-support-mtessun says, should be full core profiles. alwillia@redhat.com Can you please check the cofniguration above mentioned in the bug description, do you think this is a valid configuration?

Assignee:: Stuart Gott

Reporter:: Chenli Hu

QA Contact:: Vasiliy Sibirskiy

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Created:: 2024/05/27 9:55 AM

Updated:: 2024/10/15 2:11 PM

Resolved:: 2024/10/15 2:11 PM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: Errata Tool added a comment - 2024/10/15 2:11 PM

Expand comment: Errata Tool added a comment - 2024/10/15 2:11 PM

Collapse comment: cnv-qe jira added a comment - 2024/07/04 9:03 AM

Expand comment: cnv-qe jira added a comment - 2024/07/04 9:03 AM

Collapse comment: Luboslav Pivarc added a comment - 2024/06/26 2:49 PM

Expand comment: Luboslav Pivarc added a comment - 2024/06/26 2:49 PM

Collapse comment: cnv-qe jira added a comment - 2024/06/24 11:02 PM

Expand comment: cnv-qe jira added a comment - 2024/06/24 11:02 PM

Collapse comment: cnv-qe jira added a comment - 2024/06/24 8:02 PM

Expand comment: cnv-qe jira added a comment - 2024/06/24 8:02 PM

Collapse comment: GitLab CEE Bot added a comment - 2024/06/24 7:29 PM

Expand comment: GitLab CEE Bot added a comment - 2024/06/24 7:29 PM

Collapse comment: Vladik Romanovsky added a comment - 2024/05/31 2:50 PM

Expand comment: Vladik Romanovsky added a comment - 2024/05/31 2:50 PM

Collapse comment: Alex Williamson added a comment - 2024/05/30 7:58 PM, Edited by Alex Williamson - 2024/05/30 8:00 PM

Expand comment: Alex Williamson added a comment - 2024/05/30 7:58 PM, Edited by Alex Williamson - 2024/05/30 8:00 PM

Collapse comment: Kedar Bidarkar added a comment - 2024/05/29 12:19 PM

Expand comment: Kedar Bidarkar added a comment - 2024/05/29 12:19 PM

Collapse comment: Kedar Bidarkar added a comment - 2024/05/29 12:18 PM

Expand comment: Kedar Bidarkar added a comment - 2024/05/29 12:18 PM

People

Dates