Uploaded image for project: 'OpenShift Container Platform (OCP) Strategy'
  1. OpenShift Container Platform (OCP) Strategy
  2. OCPSTRAT-1891

Upstream: fix scheduler and autoscaler to respect volume attach limits

XMLWordPrintable

    • Upstream
    • False
    • Hide

      None

      Show
      None
    • False
    • 100% To Do, 0% In Progress, 0% Done
    • 0

      Epic Goal*

      Enhance the Kubernetes autoscaler + scheduler to respect CSINode / volume attach limits for scheduling and autoscaling

      While it's technically a bugfix for OCPBUGS-42358, it will need serious work upstream over several releases.
       
      Why is this important? (mandatory)

      When a new node appears in a cluster, the Kubernetes scheduler may not yet know what CSI drivers will be running there and what are their attach limits.  It assumes that the node has infinite attach limit  and that all CSI drivers will run on that node, and can therefore schedule more pods with volume than the node can handle.

      Those pods need to be removed manually by user. Red Hat's suggestion is solutions/7088407 (i.e. call support). 

      Technical details: it's because of autoscaler does not handle CSINode objects that contains attach limits. The autoscaler assumes infinite volume attachments for any autoscaled node. Because the autoscaler uses the Kubernetes scheduler code for its decisions, the scheduler itself must assume the same - a node without CSINode instance has all CSI drivers installed and it has infinite attach limit.

      We need to update the autoscaler to consider CSINode objects and their templating (or copying from a sample node), and then we can fix the scheduler to wait for CSINode object before scheduling a Pod to a new node.

      Scenarios (mandatory) 

      Provide details for user scenarios including actions to be performed, platform specifications, and user personas.  

      1. As Kubernetes user, I can rely on the scheduler to put my pod on a node that has  CSI drivers for the pod volumes installed, so I don't need to clean up my pods manually when they are scheduled wrong.
      2. As Kubernetes user, I can rely on the scheduler to always respect volume attach limits, especially when the limits are not yet known for a freshly created nodes (scheduler should avoid that node until volume attach limits for all CSI drivers needed by the scheduled pod are reported by kubelet)
      3. As Kubernetes admin, I can configure autoscaler to provision a node with attach limits taken into account.

      Dependencies (internal and external) (mandatory)

      Upstream scheduler + autoscaler.

      Contributing Teams(and contacts) (mandatory) 

      The team that manages autoscaler in OpenShift.

      Acceptance Criteria (optional)

      Drawbacks or Risk (optional)

      This is a complex feature, requiring sig-storage, sig-scheduling and sig-autoscaling work closely together over several releases.

      Done - Checklist (mandatory)

      The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

      • CI Testing -  Basic e2e automationTests are merged and completing successfully
      • Documentation - Content development is complete.
      • QE - Test scenarios are written and executed successfully.
      • Technical Enablement - Slides are complete (if requested by PLM)
      • Engineering Stories Merged
      • All associated work items with the Epic are closed
      • Epic status should be "Release Pending" 

              rh-gs-gcharot Gregory Charot
              rhn-engineering-jsafrane Jan Safranek
              Matthew Werner Matthew Werner
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: