1. Proposed title of this feature request.
- Custom node drain timeout setting.
2. What is the nature and description of the request?
- To have an ability to set node-drain timeouts like we had in OCP 3.x (/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3.11/upgrade_nodes.yml -e openshift_upgrade_nodes_drain_timeout=600)
3. Why does the customer need this? (List the business requirements here)
- The customer has done a lot of upgrades in past for 3.X and given their cluster size (roughly the prod clusters are having 300+ nodes and 15k+ pods running ) with 10min timeout , it used to take 30+ hours to upgrade all nodes. With 4.X, they cannot go with infinite drain loop and this is definitely not a feasible solution at all to sit and monitor upgrade process and take action if some nodes get stuck.
4. How would the customer like to achieve this? (List the functional requirements here)
- They would like to have some spec or configuration which can define the drain timeout (for specific pools or nodes), and they can overwrite such value.
5. For each functional requirement listed in question 4, specify how Red Hat and the customer can test to confirm the requirement is successfully implemented.
- Perform an upgrade by giving custom drain timeout. After given timeout value node drain should timeout and continue with the node upgrade.
- is related to
-
MCO-54 [SPIKE] Design and review solution to improve experience of MCP rollouts
- Closed
-
MCO-183 [SPIKE] Explore implementing a node drain time out
- Closed
- relates to
-
OCPPLAN-7021 Improve experience of stalled MCP rollouts
- New