Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-48124

[Bug] 4.16 VMs fail to start for larger vcpu counts (>=64)

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Critical Critical
    • CNV v4.18.0
    • CNV v4.16.1
    • CNV Virtualization
    • None
    • 5
    • False
    • Hide

      None

      Show
      None
    • False
    • ---
    • ---
    • CNV Virtualization Sprint 260, CNV Virtualization Sprint 261, CNV Virtualization Sprint 262, CNV Virt-Cluster Sprint 263, CNV Virt-Cluster Sprint 264
    • Critical
    • None

      Description of problem:

      On 4.16, if a VM is created where total vcpus are more than the cpus the host has in a single numa node, it fails to start with this error:
      
      default                                            0s          Warning   SyncFailed                          virtualmachineinstance/vm-big                                       server error. command SyncVMI failed: "LibvirtError(Code=67, Domain=10, Message='unsupported configuration: more than 255 vCPUs require extended interrupt mode enabled on the iommu device')"

      Version-Release number of selected component (if applicable):

      OCP 4.16.6
      CNV 4.16.1

      How reproducible:

      reported across multiple internal clusters

      Steps to Reproduce:

      1. Start VM w/ high vcpu count
      2. Check event logs
      3. VM fails to start
      

      Actual results:

      VM fails

      Expected results:

      VM can succeed w/ many vcpus

      Additional info:

      Using very simple VM definition, setting vcpus > 1numa node cpus (on a 128cpu host):
      
      
      apiVersion: kubevirt.io/v1
      kind: VirtualMachine
      metadata:
        labels:
          app: vm-big
        name: vm-big
      spec:
        running: true
        template:
          metadata:
            labels:
              kubevirt.io/domain: vm-big
          spec:
            domain:
              cpu:
                cores: 1
                sockets: 100
                threads: 1
              devices:
                disks:
                - disk:
                    bus: virtio
                  name: containerdisk
                - disk:
                    bus: virtio
                  name: cloudinitdisk
                interfaces:
                - masquerade: {}
                  model: virtio
                  name: default
                networkInterfaceMultiqueue: true
                rng: {}
              features:
                smm:
                  enabled: true
              firmware:
                bootloader:
                  efi: {}
              machine:
                type: pc-q35-rhel9.2.0
              memory:
                guest: 10Gi
            networks:
            - name: default
              pod: {}
            terminationGracePeriodSeconds: 180
            nodeSelector:
              kubernetes.io/hostname: worker00
            volumes:
            - containerDisk:
                image: quay.io/kubevirt/fedora-container-disk-images:35
                imagePullPolicy: IfNotPresent
              name: containerdisk
            - cloudInitNoCloud:
                userData: |-
                  #cloud-config
                  user: fedora
                  password: perf
                  chpasswd: { expire: False }
                  runcmd:
                   - sed -i -e "s/PasswordAuthentication.*/PasswordAuthentication yes/" /etc/ssh/sshd_config
                   - systemctl restart sshd
              name: cloudinitdisk

              lpivarc Luboslav Pivarc
              jhopper@redhat.com Jenifer Abrams
              Kedar Bidarkar Kedar Bidarkar
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

                Created:
                Updated: