Uploaded image for project: 'Machine Config Operator'
  1. Machine Config Operator
  2. MCO-206

Support for pre- and post-update hooks

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Obsolete
    • Icon: Undefined Undefined
    • None
    • None
    • Update Hooks
    • False
    • False
    • To Do
    • Undefined
    • 0
    • 0

      Epic Goal

      • Create a mechanism for hooking into and blocking the Machine Config Daemon's update process. This mechanism should be usable by both internal operators and customers alike.

      Why is this important?

      • There currently exists no way for an admin or operator to perform validation before an update is applied to an RHCOS node, nor after it's been applied but before the node has been re-admitted into the cluster. This makes it very difficult to insert additional validation which could aid in determining if an update is going to be or was successful. Without this mechanism, "bad" updates might be rolled out more widely than necessary, complicating rollbacks and potentially impacting capacity.

      Scenarios

      1. As a Special Resource Operator, I want to ensure that I have the proper kernel modules for the next kernel version before the update is applied. If I don't have correct modules, I know the workload is going to fail after the update, so there's no sense in attempting it.
      2. As a low-level component, I want to make sure that I still correctly function after an update, before allowing other workloads to return to the node. In the event that something is wrong, I want to prevent other workloads from returning and too many other machines from also updating (dependent on MachineConfigPool::maxUnavailable).

      Acceptance Criteria

      • CI - MUST be running successfully with tests automated
      • Release Technical Enablement - Provide necessary release enablement details and documents.
      • ...

      Dependencies (internal and external)

      1. ...

      Previous Work (Optional):

      1. https://github.com/coreos/container-linux-update-operator

      Open questions::

      1. Will this solution be accepted by customers who need to split their workloads from the control plane? Do the hooks need to avoid communication directly with the API server?
      2. Should we automatically rollback RHCOS in the event that the post-update hook fails?

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to meaningful PR>

            Unassigned Unassigned
            rhn-coreos-acrawfor Alex Crawford (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: