Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-17404

[2071886] [RFE] Option to allow scheduler to rearrange the cluster to allow certain actions to succeed

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • ---
    • ---
    • CNV Infra 227

      Consider this scenario:

      • 3 worker nodes with 30G of memory each
      • 6 VMs, with 8G RAM each, 2 running on each of the 3 workers.

      As below:

      NAME AGE PHASE IP NODENAME READY
      centos-stream8-empty-whitefish 11m Running 10.128.2.19 worker-green.shift.toca.local True
      centos7-loud-bear 13m Running 10.128.2.18 worker-green.shift.toca.local True
      fedora-advanced-goose 12m Running 10.131.0.23 worker-blue.shift.toca.local True
      fedora-civil-spoonbill 11m Running 10.131.0.24 worker-blue.shift.toca.local True
      fedora-frozen-stingray 11m Running 10.129.2.42 worker-red.shift.toca.local True
      centos-stream9-useless-raven 13m Running 10.129.2.41 worker-red.shift.toca.local True

      All the VMs are as follows:

      spec:
      domain:
      resources:
      requests:
      memory: 8Gi

      So the 3 worker nodes have 16G requested by VMs, plus some more for the other cluster pods, looking like this:

      Resource Requests Limits
      -------- -------- ------
      memory 19858Mi (64%) 0 (0%)
      memory 18024Mi (58%) 0 (0%)
      memory 20166Mi (64%) 0 (0%)

      Eeach of these worker nodes has at least capacity to run another 8G VM, but not a 16G one.

      The admin now wants to start a 16G VM. Combined, the cluster has capacity, but not on a single worker.

      The admin requests the 16G VM to start. The result is:

      NAME AGE PHASE IP NODENAME READY
      centos-stream8-empty-whitefish 11m Running 10.128.2.19 worker-green.shift.toca.local True
      centos7-loud-bear 13m Running 10.128.2.18 worker-green.shift.toca.local True
      fedora-advanced-goose 12m Running 10.131.0.23 worker-blue.shift.toca.local True
      fedora-civil-spoonbill 11m Running 10.131.0.24 worker-blue.shift.toca.local True
      fedora-frozen-stingray 11m Running 10.129.2.42 worker-red.shift.toca.local True
      centos-stream9-useless-raven 13m Running 10.129.2.41 worker-red.shift.toca.local True
      rhel7-red-llama 14m Scheduling False

      Because:
      0/7 nodes are available: 1 node(s) were unschedulable, 3 Insufficient memory, 3 node(s) had taint

      {node-role.kubernetes.io/master: }

      , that the pod didn't tolerate.

      However, the admin can do this, the solution is a simple live migration of any VM to any other node, so that one node usage goes down to 1 VM (8G) and another one goes to 3VMs (24G), and the 16G VM can start.

      For example, live migrate one VM manually and the state quickly goes through these steps without further admin interference:

      1. oc get vmi
        NAME AGE PHASE IP NODENAME READY
        centos-stream8-empty-whitefish 29m Running 10.128.2.19 worker-green.shift.toca.local True
        centos-stream9-useless-raven 30m Running 10.129.2.41 worker-red.shift.toca.local True
        fedora-frozen-stingray 28m Running 10.129.2.42 worker-red.shift.toca.local True
        centos7-loud-bear 30m Running 10.131.0.25 worker-blue.shift.toca.local True <--- migrated from Green
        fedora-advanced-goose 29m Running 10.131.0.23 worker-blue.shift.toca.local True
        fedora-civil-spoonbill 28m Running 10.131.0.24 worker-blue.shift.toca.local True
        rhel7-red-llama 26s Scheduled worker-green.shift.toca.local False

      Just a few moments later they are all running:

      1. oc get vmi
        NAME AGE PHASE IP NODENAME READY
        centos-stream8-empty-whitefish 29m Running 10.128.2.19 worker-green.shift.toca.local True
        centos-stream9-useless-raven 30m Running 10.129.2.41 worker-red.shift.toca.local True
        centos7-loud-bear 30m Running 10.131.0.25 worker-blue.shift.toca.local True
        fedora-advanced-goose 29m Running 10.131.0.23 worker-blue.shift.toca.local True
        fedora-civil-spoonbill 28m Running 10.131.0.24 worker-blue.shift.toca.local True
        fedora-frozen-stingray 28m Running 10.129.2.42 worker-red.shift.toca.local True
        rhel7-red-llama 34s Running 10.128.2.22 worker-green.shift.toca.local True

      Request: the admin would like to have an option for this to work automatically, without admin interference to solve this problem.

      I did some research on PriorityClasses, but it seems they are wiped from VM spec. And even if they worked, it would create an ever increase priority problem so I cannot find a solution for this with currently available mechanisms.

      The customer is requesting this for VMs, not Pods, even though a generic solution could work for both.

      Finally, the example above is for (re)starting a VM (i.e. highly available one), but the same logic is requested for other tasks such as draining nodes.

              sgott@redhat.com Stuart Gott
              rhn-support-gveitmic Germano Veit Michel
              Kedar Bidarkar Kedar Bidarkar
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated: