-
Epic
-
Resolution: Done
-
Critical
-
None
-
None
-
Admin-defined reboot and drain
-
5
-
False
-
None
-
False
-
Not Selected
-
Done
-
OCPSTRAT-380 - Admin-defined node disruption: Tech Preview
-
OCPSTRAT-380Admin-defined node disruption: Tech Preview
-
0% To Do, 4% In Progress, 96% Done
-
M
-
0
-
0.000
This epic is another epic under the "reduce workload disruptions" umbrella.
This is now updated to get us most of the way to MCO-200 (Admin-Defined reboot & drain), but not necessarily with all the final features in place.
This epic aims to create a reboot/drain policy object and a MCO-management apparatus for initial functionality with MachineConfig backed updates, with a restricted set of actions for the user. We also need reboot/drain policy object for ImageContentSourcePolicy, ImageTagMirrorSet and ImageDigestMirrorSet to avoid drains/reboots when admins use these APIs and have other ways of ensuring image integrity.,
This mostly focuses on the user interface for defining reboot/drain policies. We will also need this for the layering "live apply" cases and bifrost-backed updates, to be implemented into a future update.
The MCO's reboot and drain rules are currently hard-coded in the machine-config-daemon here.
Node drains also occur even beyond OCP 4.9 when not just adding but also removing ICSP, ITMS, IDMS objects or single mirroring rules in their configuratuion according to RFE-3667.
This causes at least three problems:
- A user does not know what the rules are unless they read the code (the rules aren't visible to the user)
- The controller can't see the rules to "pre-compute" the effect that a MachineConfig will have on a Node before that MachineConfig is delivered (which makes it hard for a user to know what will actually happen if they apply a config)
- The only way for a template owner to mark their config as "does not require reboot" is to edit the MCD code
Done when:
- A CRD is defined for post config action policies covering both MCO and ICSP/ITMS/IDMS APIs
- The existing daemon rules are broken out into one of these resources
- The reboot/drain policies are visible in the cluster (e.g. "oc get rebootpolicies")
- The drain controller handles processing and validation of the user's policies (and could put the computed post-config actions in the machineconfig's and ICSP/ITMS/IDMS status or the custom image's metadata if layering)
- A template owner has a procedure to mark that their template config changing does/does not require a reboot
- is related to
-
MCO-517 Prevent node availabilty check when the kubelet is shutdown
- Closed
-
OCPSTRAT-380 Admin-defined node disruption: Tech Preview
- Closed
-
OCPSTRAT-1026 Admin-defined node disruption policies: Phase 2 (GA)
- Closed
-
MCO-474 Investigate MCO reboot behavior when machine-os content hasn't changed during upgrade
- Closed
- relates to
-
OCPBUGS-32783 NodeDisruptionPolicy action reload cannot take effect
- Verified
-
OCPBUGS-32511 NodeDisruptionPolicyStatus was not ready context deadline exceeded
- Verified
-
OCPBUGS-32739 MachineConfigurations is only effective with name <cluster>
- Closed
- links to