Uploaded image for project: 'Machine Config Operator'
  1. Machine Config Operator
  2. MCO-690

MVP: Implement StateController

    XMLWordPrintable

Details

    • Story
    • Resolution: Done
    • Major
    • None
    • None
    • None
    • 8
    • False
    • None
    • False
    • OCPSTRAT-845 - [Tech Preview] Proper MCO State Reporting
    • MCO Sprint 242
    • 0
    • 0.0

    Description

      The issue with the machine config operator's status reporting is that it is (at times) jagged, separated, and hard to change in major ways. The operator needs to be compartmentalized.

      This compartmentalization of the MCO most importantly needs to start with a concise, status/state controller.

      •     Value can be added to the MCO by making it applicable. an untethered yet completely though through kubernetes status reporter is a valuable asset and can elicit some outside feedback – see OKD!!
      •     Adding yet another status reporting mechanism on top of what already exists might limit code (by a small amount) but would make this component much harder to wire and ultimately to maintain. We could have implemented the build controller by placing it around code that exists, swallowing existing code paths inside of larger code blocks, but separation can be a great tool.
      •     This could also make the status controller easier to manage (user input wise) since we could technically make its own binary or feed it directly from the controller and/or daemon's input, depending on how far we choose to go with this   

      What would this look like? https://github.com/openshift/enhancements/pull/1490

      So the controller could either take the shape of the BuildController where it looks like:

      BuildController – BuildControllerConfig, Run(), etc

      •     Image Build Controller – Specific entities that relate to imageBuilding, syncFunc that gets managed and run by its parent BuildController

      This parent, child relationship allows the build controller to be managed by a binary and an on switch… pretty smooth.

      OR

      Make it in the style of the node-controller where it is just a subcontroller of the MCC

      Pros/cons? What fits our work more?

      Well if we run with the alternate controller (build controller) style, this could allow us to run without being held back by the MCC (bootstrap, firstboot)

      • This could open up a lot of doors for seeing what actually went wrong during processes that we usually don't get to see
      • The State Controller should be all knowing at all times, allowing it to come up during cluster creation (or not if that is what the user wants) and then come back online during regular firstboot
      • Will allow the MCO "books" to have chapters for each stage in the MCOs life cycle

          

      If we run with the sub-controller (node controller) style, this could allow us to be a little more discreet

      • Less overhead and planning
      • Less possible points of failure for the state controller
      • However, this will report less information during less periods of time.   

      Limiting the verbosity and appearance of a controller that is meant to help the customer, goes against the actual purpose of the controller. So this means we should aim for the most customer facing, and long living architecture...

      this makes me lean toward implementing an alternate controller model mainly for the purpose of

          1) cluster creation and firstboot state gathering

          2) unified bookkeeping strategy where we could SEE what happens in bootstrap once the cluster comes up (this could be game changing for us and the customer)

       

      I think the main point of contention here will be maintaining a new large structure, but I don’t think this work will be done properly without removing the current status reporting mechanisms and replacing them with better ones.

       

      Update as of 9/20

      What is actually getting implemented in this MVP

      • UpgradeProgression
      • OperatorHealth
      • MCC-Health
      • MCD-Health

      What is going to be split out into separate cards?

      •  bootstrap-progression
      •  metrics

       
      Update as of 10/6

      What is actually getting implemented in this MVP

      • UpgradeProgression

      What is going to be split out into separate cards?

      • OperatorHealth
      • MCC-Health
      • MCD-Health
      •  bootstrap-progression
      •  metrics

      Why the change?

      It has been determined through API review that only upgrade health should live in the MachineConfigNode API type. However, the other progressions will live in the operator/v1 group under the machineconfiguration object. They will be done in parallel just not in this MVP so I am making that delineation clear.

       

      Follow https://github.com/openshift/machine-config-operator/pull/3970 for MachineConfigNodeMVP

       

      What does this mean for the state reporting of the MCO?

       

      Currently, throughout the MCO, components try to ask the daemon... are we updating? This is an attempt to get a simple yes/no answer to a question that has multiple answers. Once the API types, backend, and other logic of the state controller is implemented, we can aim to replace all instances of functions like `GetMachineConfigPoolCondition` with more robust and accurate tooling.

       

      For example: places such as this which base its logic off of nodes which the MCO does not own or directly manages (besides annotations) will switch to being solely/primarily based off of our bookkeeping and state reporting in the MachineStateController. This will allow us to fine tune what where each node actually belongs.

      This will resolve multiple cards in MCO-452 including: https://issues.redhat.com/browse/MCO-453 and https://issues.redhat.com/browse/MCO-473

      Attachments

        Activity

          People

            cdoern@redhat.com Charles Doern
            cdoern@redhat.com Charles Doern
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: