Uploaded image for project: 'Machine Config Operator'
  1. Machine Config Operator
  2. MCO-507

Admin-defined node disruption - Tech Preview

XMLWordPrintable

    • Admin-defined reboot and drain
    • 5
    • False
    • None
    • False
    • Not Selected
    • Done
    • OCPSTRAT-380 - Admin-defined node disruption: Tech Preview
    • OCPSTRAT-380Admin-defined node disruption: Tech Preview
    • 0% To Do, 4% In Progress, 96% Done
    • M
    • 0
    • 0.000

      This epic is another epic under the "reduce workload disruptions" umbrella. 

      This is now updated to get us most of the way to MCO-200 (Admin-Defined reboot & drain), but not necessarily with all the final features in place.

      This epic aims to create a reboot/drain policy object and a MCO-management apparatus for initial functionality with MachineConfig backed updates, with a restricted set of actions for the user. We also need reboot/drain policy object for ImageContentSourcePolicy, ImageTagMirrorSet and ImageDigestMirrorSet to avoid drains/reboots when admins use these APIs and have other ways of ensuring image integrity.,

      This mostly focuses on the user interface for defining reboot/drain policies. We will also need this for the layering "live apply" cases and bifrost-backed updates, to be implemented into a future update.

      The MCO's reboot and drain rules are currently hard-coded in the machine-config-daemon here.

      Node drains also occur even beyond OCP 4.9 when not just adding but also removing ICSP, ITMS, IDMS objects or single mirroring rules in their configuratuion according to RFE-3667.

      This causes at least three problems:

      • A user does not  know what the rules are unless they read the code (the rules aren't visible to the user)
      • The controller can't see the rules to "pre-compute" the effect that a MachineConfig will have on a Node before that MachineConfig is delivered (which makes it hard for a user to know what will actually happen if they apply a config)
      • The only way for a template owner to mark their config as "does not require reboot" is to edit the MCD code

      Done when:

      • A CRD is defined for post config action policies covering both MCO and ICSP/ITMS/IDMS APIs
      • The existing daemon rules are broken out into one of these resources
      • The reboot/drain policies are visible in the cluster (e.g. "oc get rebootpolicies")
      • The drain controller handles processing and validation of the user's policies (and could put the computed post-config actions in the machineconfig's and ICSP/ITMS/IDMS status or the custom image's metadata if layering)
      • A template owner has a procedure to mark that their template config changing does/does not require a reboot

              jerzhang@redhat.com Yu Qi Zhang
              jkyros@redhat.com John Kyros
              Team MCO
              Rio Liu Rio Liu
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: