Loading...

Type: Story
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:
None

Story Points:
8
Blocked:
False
Blocked Reason:
None
Ready:
False
Epic Link:
MCO state reporting
Feature Link:
OCPSTRAT-845 - [Tech Preview] Proper MCO State Reporting
Intelligence Requested:
Market:

Sprint:
MCO Sprint 242
Cost of Delay:
0
WSJF:
0.000

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

The issue with the machine config operator's status reporting is that it is (at times) jagged, separated, and hard to change in major ways. The operator needs to be compartmentalized.

This compartmentalization of the MCO most importantly needs to start with a concise, status/state controller.

Value can be added to the MCO by making it applicable. an untethered yet completely though through kubernetes status reporter is a valuable asset and can elicit some outside feedback – see OKD!!
Adding yet another status reporting mechanism on top of what already exists might limit code (by a small amount) but would make this component much harder to wire and ultimately to maintain. We could have implemented the build controller by placing it around code that exists, swallowing existing code paths inside of larger code blocks, but separation can be a great tool.
This could also make the status controller easier to manage (user input wise) since we could technically make its own binary or feed it directly from the controller and/or daemon's input, depending on how far we choose to go with this

What would this look like? https://github.com/openshift/enhancements/pull/1490

So the controller could either take the shape of the BuildController where it looks like:

BuildController – BuildControllerConfig, Run(), etc

Image Build Controller – Specific entities that relate to imageBuilding, syncFunc that gets managed and run by its parent BuildController

This parent, child relationship allows the build controller to be managed by a binary and an on switch… pretty smooth.

OR

Make it in the style of the node-controller where it is just a subcontroller of the MCC

Pros/cons? What fits our work more?

Well if we run with the alternate controller (build controller) style, this could allow us to run without being held back by the MCC (bootstrap, firstboot)

This could open up a lot of doors for seeing what actually went wrong during processes that we usually don't get to see
The State Controller should be all knowing at all times, allowing it to come up during cluster creation (or not if that is what the user wants) and then come back online during regular firstboot
Will allow the MCO "books" to have chapters for each stage in the MCOs life cycle

If we run with the sub-controller (node controller) style, this could allow us to be a little more discreet

Less overhead and planning
Less possible points of failure for the state controller
However, this will report less information during less periods of time.

Limiting the verbosity and appearance of a controller that is meant to help the customer, goes against the actual purpose of the controller. So this means we should aim for the most customer facing, and long living architecture...

this makes me lean toward implementing an alternate controller model mainly for the purpose of

1) cluster creation and firstboot state gathering

2) unified bookkeeping strategy where we could SEE what happens in bootstrap once the cluster comes up (this could be game changing for us and the customer)

I think the main point of contention here will be maintaining a new large structure, but I don’t think this work will be done properly without removing the current status reporting mechanisms and replacing them with better ones.

Update as of 9/20

What is actually getting implemented in this MVP

UpgradeProgression
OperatorHealth
MCC-Health
MCD-Health

What is going to be split out into separate cards?

bootstrap-progression
metrics

Update as of 10/6

What is actually getting implemented in this MVP

UpgradeProgression

What is going to be split out into separate cards?

OperatorHealth
MCC-Health
MCD-Health

bootstrap-progression
metrics

Why the change?

It has been determined through API review that only upgrade health should live in the MachineConfigNode API type. However, the other progressions will live in the operator/v1 group under the machineconfiguration object. They will be done in parallel just not in this MVP so I am making that delineation clear.

Follow https://github.com/openshift/machine-config-operator/pull/3970 for MachineConfigNodeMVP

What does this mean for the state reporting of the MCO?

Currently, throughout the MCO, components try to ask the daemon... are we updating? This is an attempt to get a simple yes/no answer to a question that has multiple answers. Once the API types, backend, and other logic of the state controller is implemented, we can aim to replace all instances of functions like `GetMachineConfigPoolCondition` with more robust and accurate tooling.

For example: places such as this which base its logic off of nodes which the MCO does not own or directly manages (besides annotations) will switch to being solely/primarily based off of our bookkeeping and state reporting in the MachineStateController. This will allow us to fine tune what where each node actually belongs.

This will resolve multiple cards in ~~MCO-452~~ including: https://issues.redhat.com/browse/MCO-453 and https://issues.redhat.com/browse/MCO-473

is related to

MCO-846 Customizable Observability in the MCO

Closed

Details

Description

What would this look like? https://github.com/openshift/enhancements/pull/1490

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates