-
Epic
-
Resolution: Unresolved
-
Major
-
None
-
None
-
Upstream: design scheduler and autoscaler to respect volume attach limits
-
Upstream
-
3
-
False
-
None
-
False
-
Green
-
To Do
Epic Goal*
Design how to fix the Kubernetes autoscaler + scheduler to respect CSINode / volume attach limits for scheduling and autoscaling.
Why is this important? (mandatory)
When a new node appears in a cluster, the Kubernetes scheduler may not yet know what CSI drivers will be running there and what are their attach limits. It assumes that the node has infinite attach limit and that all CSI drivers will run on that node, and can therefore schedule more pods with volume than the node can handle.
- https://github.com/kubernetes/kubernetes/issues/95911
- https://issues.redhat.com/browse/OCPBUGS-42358 / https://github.com/kubernetes/kubernetes/issues/126921
Those pods need to be removed manually by user. Red Hat's suggestion is solutions/7088407 (i.e. call support).
Technical details: it's because of autoscaler does not handle CSINode objects that contains attach limits. The autoscaler assumes infinite volume attachments for any autoscaled node. Because the autoscaler uses the Kubernetes scheduler code for its decisions, the scheduler itself must assume the same - a node without CSINode instance has all CSI drivers installed and it has infinite attach limit.
Scenarios (mandatory)
As Kubernetes developer, I want sig-scheduling, sig-storage and sig-autoscaling to agree on a soltion, so I can implement it in follow up epics.
Dependencies (internal and external) (mandatory)
Contributing Teams(and contacts) (mandatory)
- Development -
Acceptance Criteria (optional)
An upstream KEP exists and is approved by sig-scheduling, sig-storage and sig-autoscaling and merged.
Drawbacks or Risk (optional)
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
- CI Testing - Basic e2e automationTests are merged and completing successfully
- Documentation - Content development is complete.
- QE - Test scenarios are written and executed successfully.
- Technical Enablement - Slides are complete (if requested by PLM)
- Engineering Stories Merged
- All associated work items with the Epic are closed
- Epic status should be "Release Pending"