Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: machine-api
Labels:
None

Target Version:
None
Activity Type:
Product / Portfolio Work
Status Summary:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Products:
None
Hierarchy Progress Bar:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
None
PX Impact Score:
PX Impact Range:
None
PX Priority Data:
None
PX Technical Impact:
None
PX Technical Impact Notes:
None
PX Scheduling Request:
None

1. Proposed title of this feature request
MCP orchestrated reboots

2. What is the nature and description of the request?
Sometimes nodes need to be rebooted due to underlying issues on the platform. This could be issues at the cri-o level, kublet level. It could be a result of an environmental issue which has resulted in high load on nodes or defunct of zombie process, there a number of reasons etc.

To manually cordon, drain and reboot a few nodes isn't an issue. However, for medium or large environments this is much more time consuming and intensive for the admins.

When you apply a Machine config, the MCP operator will roll out and sequentially work through the nodes to apply the update and reboot the node and bring back the app workloads.

To be able to leverage the MCP operator to roll through and sequentially and methodically reboot the nodes just like a machine config (except in this case we aren't applying anything, just rebooting) can really lessen the load on admins of medium or large environments.

3. Why does the customer need this? (List the business requirements here)
Some customers are not running 6 node clusters, but rather much larger environments like 50 node clusters. From the support perspective I observed this when a customer experienced a power outage which left the nodes in various states in which they needed to go through and cordon drain and reboot all the nodes 1 by 1.

Being able to leverage the MCP operator which can already do this will make things significantly easier for admins of medium to large clusters.

4. List any affected packages or components.

Assignee:: Subin M

Reporter:: Matthew McComas

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/03/07 10:28 PM

Updated:: 2025/07/03 1:21 PM

Target start:: None

Target end:: None

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates