-
Feature Request
-
Resolution: Unresolved
-
Major
-
None
-
openshift-4.14, openshift-4.16, openshift-4.18
-
None
-
Product / Portfolio Work
-
None
-
False
-
-
None
-
None
-
-
-
-
-
-
-
None
-
None
1. Proposed title of this feature request
Ability to change cluster MTU while minimizing workload disruption
2. What is the nature and description of the request?
Currently the change of cluster MTU is an operation that requires a minimum of two reboots and is executed on all nodes of the cluster at the same time.
3. Why does the customer need this? (List the business requirements here)
Customer requires an procedure to change cluster network MTU and machine network MTU that attempts to minimizes workload disruption (e.g., potentially using MCP pools). The main advantages of using a per-Machine Config Pool(MCP)deployment strategy for critical operations such as day-2 MTU updates, are:
- Granular and phased day-2 MTU updates. Machine Config Pools enable the logical subdivision of OpenShift nodes into different groups based on customer planning parameters (e.g., workload requirements).
- Minimize service interruption and risk. This controlled deployment method is crucial for minimizing interruption of sensitive workloads, such as telco-grade services, during the necessary node reboots.
Procedure: Let's say you have two MCPs for workloads, MCP A (for master nodes), MCP B for appworkers, MCP C (for other appworkers), MCP D for worker nodes of type gateway, and MCP E for storage nodes. A procedure like this should be doable and documented in order to be supported:
- Pause MCP A, MCP B, MCP C, MCP D, and MCP E.
- Prepare the cluster for cluster network and potential machine network MTU migration (setting the MTU migration configuration option)
- Unpause MCP A, it will be updated ( 1 reboot of nodes in MCP A)
- Unpause MCP B, it will be updated (1 reboot of nodes in MCP B)
- Unpause MCP C, it will be updated (1 reboot of nodes in MCP C)
- Unpause MCP D, it will be updated (1 reboot of nodes in MCP D)
- Unpause MCP E, it will be updated (1 reboot of nodes in MCP E)
- We have all nodes of the cluster having completed step 1 of the MTU procedure
- Pause MCP A,B, C, D, E again (if needed)
- Reconfigure the cluster for the new machine network MTU (Nokia MC is working)
- Unpause MCP A, it will be updated (1 reboot)
- Unpause MCP B, it will be updated (1 reboot)
- Unpause MCP C, it will be updated (1 reboot)
- Unpause MCP D, it will be updated (1 reboot)
- Unpause MCP E, it will be updated (1 reboot)
- We have all nodes of the cluster having completed step 2 (optional step only if you need to update machine network MTU) of the MTU procedure
- Pause MCP A, MCP B, MCP C, MCP D, and MCP E
- Take the cluster out of MTU migration (unsetting the mtu migration configuration option, finalize the migration)
- Unpause MCP A, it will be updated
- Unpause MCP B, it will be updated
- Unpause MCP C, it will be updated
- Unpause MCP D, it will be updated
- Unpause MCP E, it will be updated
- We have all nodes of the having completed step 3 of the MTU procedure, migration done
The ask here is to industrialize the procedure above described in detail in the interim KB article. In particular to:
- Move the documentation from the KB article into the officially supported product docs.
- Specify what to do in these situations where the MTU is already specified in the network manager connection (nmconnection) files.
4. List any affected packages or components.
None.