-
Epic
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
Refactor OCL Reconciler
-
False
-
-
False
-
Not Selected
-
Done
-
50% To Do, 0% In Progress, 50% Done
-
0
Background / Problem
After our MCO Team whiteboard session (see Miro board here) and adding reboot functionality, we have identified a point; our build controller’s reconciler mixes event-driven handlers with broad “sync everything” loops (syncAll, syncMachineOSConfigs, syncMachineOSBuilds, syncMachineConfigPools). This creates:
- Redundant work and confusing call graphs
- Hard-to-reason state transitions (esp. around create-or-reuse MOSB)
- Coupling between wiring (informers/queue) and regular logic
We want a clearer, event-driven reconciliation model with minimal, targeted syncs and a smaller surface for bugs.
Goals
- Make sync behavior predictable, minimal, and local to the event that triggered it.
- Reduce the number of places that can enqueue broad syncs.
- Simplify and harden createNewMachineOSBuildOrReuseExisting semantics.
- Preserve single queue semantics and backoff, but reduce churn.
- Keep parity with user-visible behavior (no confusing logs).
Scope & Approach
- Collapse & Localize Sync Loops
-
- Remove or narrow syncAll usages.
-
- Restrict syncMachineOSConfigs/Builds/Pools to targeted callers.
-
- Ensure every event handler calls one idempotent, object-scoped reconcile.
- Clarify MOSB Creation/Re-use
-
- Make createNewMachineOSBuildOrReuseExisting pure & explicit:
-
-
- Separate paths for: pool-change rebuild, annotation rebuild, missing-image rebuild, and “no-op/reuse”.
-
-
-
- Document and enforce transitions (state machine + unit tests).
-
-
- Keep the “delete old MOSB before recreate” only for annotation rebuilds (fixes the MCP degrade/regression).
- Event→Action Map
-
- For each informer (MOSC, MOSB, Job, MCP), define precise actions:
-
-
- What fields trigger work?
-
-
-
- What sync (if any) is allowed?
-
-
-
- Effectively log
-
- Developer Split Controller/Develop backup reconciler in tandem
-
- Prototype a second controller (e.g., MCP→MOSB builder trigger)/ build a replacement reconciler to replace the first reconciler when testing is green
-
- Keep a single queue per controller (if we go the second controller route); ensure ownership boundaries are clean.
Acceptance Criteria
- Controlled use of syncAll and other syncs from event handlers.
- createNewMachineOSBuildOrReuseExisting has table-driven tests covering:
-
- pool-change (new rendered-MC) → create new MOSB (no pre-delete)
-
- annotation rebuild → delete old MOSB then recreate with same name
-
- missing-image rebuild → delete old MOSB (if needed) then recreate
-
- no-op path → reuse existing, no new MOSB or Job
- All OCL e2e tests consistently pass in CI without manual sleeps/timeouts increases.