[OCPSTRAT-1891] Upstream: fix scheduler and autoscaler to respect volume attach limits - Red Hat Issue Tracker

Type: Feature
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: Autoscaling, Storage
Labels:
- FPC:TODO-Close-ALL-Epics
- FPC:TODO-Create-Delivery-Epics

Work Type:
Upstream
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False

Risk Score:
0

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Intelligence Requested:
Market:

Epic Goal*

Enhance the Kubernetes autoscaler + scheduler to respect CSINode / volume attach limits for scheduling and autoscaling

While it's technically a bugfix for OCPBUGS-42358, it will need serious work upstream over several releases.

Why is this important? (mandatory)

When a new node appears in a cluster, the Kubernetes scheduler may not yet know what CSI drivers will be running there and what are their attach limits. It assumes that the node has infinite attach limit and that all CSI drivers will run on that node, and can therefore schedule more pods with volume than the node can handle.

Those pods need to be removed manually by user. Red Hat's suggestion is solutions/7088407 (i.e. call support).

Technical details: it's because of autoscaler does not handle CSINode objects that contains attach limits. The autoscaler assumes infinite volume attachments for any autoscaled node. Because the autoscaler uses the Kubernetes scheduler code for its decisions, the scheduler itself must assume the same - a node without CSINode instance has all CSI drivers installed and it has infinite attach limit.

We need to update the autoscaler to consider CSINode objects and their templating (or copying from a sample node), and then we can fix the scheduler to wait for CSINode object before scheduling a Pod to a new node.

Scenarios (mandatory)

Provide details for user scenarios including actions to be performed, platform specifications, and user personas.

As Kubernetes user, I can rely on the scheduler to put my pod on a node that has CSI drivers for the pod volumes installed, so I don't need to clean up my pods manually when they are scheduled wrong.
As Kubernetes user, I can rely on the scheduler to always respect volume attach limits, especially when the limits are not yet known for a freshly created nodes (scheduler should avoid that node until volume attach limits for all CSI drivers needed by the scheduled pod are reported by kubelet)
As Kubernetes admin, I can configure autoscaler to provision a node with attach limits taken into account.

Dependencies (internal and external) (mandatory)

Upstream scheduler + autoscaler.

Contributing Teams(and contacts) (mandatory)

The team that manages autoscaler in OpenShift.

Acceptance Criteria (optional)

Drawbacks or Risk (optional)

This is a complex feature, requiring sig-storage, sig-scheduling and sig-autoscaling work closely together over several releases.

Done - Checklist (mandatory)

The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.

CI Testing - Basic e2e automationTests are merged and completing successfully
Documentation - Content development is complete.
QE - Test scenarios are written and executed successfully.
Technical Enablement - Slides are complete (if requested by PLM)
Engineering Stories Merged
All associated work items with the Epic are closed
Epic status should be "Release Pending"

Assignee:: Gregory Charot

Reporter:: Jan Safranek

Doc Contact:: Matthew Werner

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2024/12/13 12:20 PM

Updated:: 2025/03/27 8:15 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates