-
Feature
-
Resolution: Done
-
Critical
-
None
-
BU Product Work
-
False
-
False
-
0% To Do, 0% In Progress, 100% Done
-
Undefined
-
0
-
Program Call
-
-
-
Some aspects of the feature have changed since the TP release, so enablement material should at least be updated
-
Phase 2 Deliverable:
GA support for a generic interface for administrators to define custom reboot/drain suppression rules.
Epic Goal
- Allow administrators to define which machineconfigs won't cause a drain and/or reboot.
- Allow administrators to define which ImageContentSourcePolicy/ImageTagMirrorSet/ImageDigestMirrorSet won't cause a drain and/or reboot
- Allow administrators to define alternate actions (typically restarting a system daemon) to take instead.
- Possibly (pending discussion) add switch that allows the administrator to choose to kexec "restart" instead of a full hw reset via reboot.
Why is this important?
- There is a demonstrated need from customer cluster administrators to push configuration settings and restart system services without restarting each node in the cluster.
- Customers are modifying ICSP/ITMS/IDMS outside post day 1/adding them+
- (kexec - we are not committed on this point yet) Server class hardware with various add-in cards can take 10 minutes or longer in BIOS/POST. Skipping this step would dramatically speed-up bare metal rollouts to the point that upgrades would proceed about as fast as cloud deployments. The downside is potential problems with hardware and driver support, in-flight DMA operations, and other unexpected behavior. OEMs and ODMs may or may not support their customers with this path.
Scenarios
- As a cluster admin, I want to reconfigure sudo without disrupting workloads.
- As a cluster admin, I want to update or reconfigure sshd and reload the service without disrupting workloads.
- As a cluster admin, I want to remove mirroring rules from an ICSP, ITMS, IDMS object without disrupting workloads because the scenario in which this might lead to non-pullable images at a undefined later point in time doesn't apply to me.
Acceptance Criteria
- CI - MUST be running successfully with tests automated
- Release Technical Enablement - Provide necessary release enablement details and documents.
- ...
Dependencies (internal and external)
- ...
Previous Work (Optional):
- …
Open questions::
- …
Done Checklist
- CI - CI is running, tests are automated and merged.
- Release Enablement <link to Feature Enablement Presentation>
- DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
- DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
- DEV - Downstream build attached to advisory: <link to errata>
- QE - Test plans in Polarion: <link or reference to Polarion>
- QE - Automated tests merged: <link or reference to automated tests>
- DOC - Downstream documentation merged: <link to meaningful PR>
- clones
-
OCPSTRAT-380 Admin-defined node disruption: Tech Preview
- Closed
- is cloned by
-
OCPSTRAT-1550 Enhanced admin-defined reboot & drain policies
- New
- is related to
-
RFE-4079 Configurable rebootless MachineConfigs
- Accepted
-
MCO-474 Investigate MCO reboot behavior when machine-os content hasn't changed during upgrade
- Closed
- relates to
-
CNV-35883 Enable defining schedule/acks/tuning for workloadUpdateStrategy
- New
-
RFE-4661 Allow user to opt-out of IDMS / ITMS node drain on entry removal
- Backlog
-
MCO-507 Admin-defined node disruption - Tech Preview
- Closed
-
RFE-3549 Method to make simple configuation changes without forcing reboots
- Accepted
- links to