Uploaded image for project: 'Machine Config Operator'
  1. Machine Config Operator
  2. MCO-1506

Image Mode Status Reporting GA & MCN Improvements

XMLWordPrintable

    • Image Mode Status Reporting
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • In Progress
    • OCPSTRAT-1282 - OpenShift Image Mode State Reporting GA
    • OCPSTRAT-1282OpenShift Image Mode State Reporting GA
    • 51% To Do, 11% In Progress, 37% Done
    • M
    • 0
    • 0

      This epic describes the work required to officially release Status Reporting for image mode updates as General Availability (GA) and make enhancements to the MachineConfigNode (MCN) feature. This is an extension of MCO-836, which related to GA-ing a minimal viable version of the MCN feature to enable the GA of PinnedImageSets (PIS), and includes tasks required to fully release Status Reporting through the MCN resources and make improvements to MCN that were excluded in the initial GA effort.

      Related Documentation:

      Proposed scope of work for "Status Reporting" in 4.20:

      • In scope - The scope of the work in this epic intends to achieve two primary objectives:
        1. Improve the existing MCN functionality so that the experience is consistent across both standard node updates and on-cluster image mode updates
        2. Update the MachineConfigPool (MCP) status population to be consistent with what was highlighted as the intention in the original MCN enhancement
      • Out of scope - MCN triggered updates
        1. Observability through metrics: This document shares an original ideation of what status reporting could look like, with the scope falling into a few different categories. However, the primary theme throughout the document is having flexible and customizable metrics provided to the user.
          • While providing this experience to users sounds like a wonderful improvement to user experience and how users interact with the MCO, it is not in the original MCN/State Reporting enhancement and likely needs extensive refinement with product & developers to understand user needs, development effort, risks, etc.
          • With 3 sprints until feature freeze (at the time of this update on June 18th) and the extensive refinement needed to architect and understand metric needs, it does not seem reasonable to implement in 4.20.
          • I propose that engineers work alongside product to understand how metrics can work towards the idea of improved status reporting, work together to write an enhancement in openshift/enhancements, and plan to work on implementation in a future release.
        2. MCN triggered updates: Another idea that exists mostly as historical knowledge about the idea of "Status Reporting" is using the MCN resource to trigger node updates and phase out the use of node annotations by the MCO.
          • This idea also needs to be better refined by the team, but it would be a good shift in practice for how the MCO operates.
          • Given the impact of such a change, this idea should be implemented in tech preview and allowed to soak for a few releases before being fully released in GA.
          • I propose that the team works to write and socialize an enhancement around this idea then start implementation in tech preview in a later release, such as 4.21.

      Work Required:

      • [SPIKE] Decide what statuses need to be added to the MCN conditions list and if other information should be stored/reported in the MCN
      • Create a StatusReporting feature gate for the API changes & added status reporting functionality
      • Update the API (will be in the MCN CustomResourceDefinition (CRD))
      • Implement Status Reporting functionality
      • Make improvements to MCN functionality
      • Create tests for Status Reporting's component readiness signal
        • Note that the tests we are able to implement will depend on whether the MCO team has a disruptive test suite.
      • Monitor Status Reporting tests for component readiness
        • Note that we will need 7 days of green tests in all platforms required by the API's verify-feature-promotion test to be able to GA, so the tests will need to be completed as early in this GAing effort as possible.
      • Graduate StatusReporting feature gate to defaultĀ 

      Done when:

      • Status Reporting is GAed
      • MCN improvements are complete
      • Tests are created, encompassing of major functionality, and passing

              rh-ee-ijanssen Isabella Janssen
              mkrejci-1 Michelle Krejci
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: