Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-3382

[RFE] add all reasons a pod can't be scheduled to status conditions


    • False
    • None
    • False
    • Not Selected
    • 0
    • 0% 0%

      This issue was cloned from a bugzilla ticket. Original text follows:


      Description of problem:

      If one has worker nodes without vmx/svm and launch a Windows VM, the virt-laucher pod fails to schedule with:


      • lastProbeTime: null
        lastTransitionTime: "2022-08-26T04:59:59Z"
        message: '0/7 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }

        , that the pod didn''t tolerate, 4 node(s) didn''t match Pod''s node affinity/selector.'
        reason: Unschedulable
        status: "False"
        type: PodScheduled

      If its a Linux VM the message is actually useful to the customer:


      • lastProbeTime: null
        lastTransitionTime: "2022-08-26T05:09:52Z"
        message: '0/7 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }

        , that the pod didn''t tolerate, 4 Insufficient devices.kubevirt.io/kvm.'
        reason: Unschedulable
        status: "False"
        type: PodScheduled

      This is because of the nodeSelector of the Windows VM has these:

      hyperv.node.kubevirt.io/frequencies: "true"
      hyperv.node.kubevirt.io/ipi: "true"
      hyperv.node.kubevirt.io/reenlightenment: "true"
      hyperv.node.kubevirt.io/reset: "true"
      hyperv.node.kubevirt.io/runtime: "true"
      hyperv.node.kubevirt.io/synic: "true"
      hyperv.node.kubevirt.io/synictimer: "true"
      hyperv.node.kubevirt.io/tlbflush: "true"
      hyperv.node.kubevirt.io/vpindex: "true"
      kubevirt.io/schedulable: "true"

      Which are not present on the node object if the node does not have virtualization:

      So it does not match the node, and fails before checking if the resource kvm is available for use which would in turn produce an useful error event.

      This sort of basic problem must be clear to the user with proper event messages.

      Version-Release number of selected component (if applicable):
      CNV 4.10.4
      OCP 4.10.26

      How reproducible:

      Steps to Reproduce:
      1. Disable VMX on Worker Nodes
      2. Start a Linux and a Windows VM
      3. Windows VM fails to start with unclear message

            rh_pelauter@redhat.com Peter Lauterbach
            rhn-support-gveitmic Germano Veit Michel
            0 Vote for this issue
            4 Start watching this issue
