Uploaded image for project: 'Machine Config Operator'
  1. Machine Config Operator
  2. MCO-1841

Refactor OCL Reconciler and Sync Loops

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • Refactor OCL Reconciler
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • Done
    • 50% To Do, 0% In Progress, 50% Done
    • 0

      Background / Problem

      After our MCO Team whiteboard session (see Miro board here) and adding reboot functionality, we have identified a point; our build controller’s reconciler mixes event-driven handlers with broad “sync everything” loops (syncAll, syncMachineOSConfigs, syncMachineOSBuilds, syncMachineConfigPools). This creates:

      • Redundant work and confusing call graphs
      • Hard-to-reason state transitions (esp. around create-or-reuse MOSB)
      • Coupling between wiring (informers/queue) and regular logic

      We want a clearer, event-driven reconciliation model with minimal, targeted syncs and a smaller surface for bugs.

      Goals

      • Make sync behavior predictable, minimal, and local to the event that triggered it.
      • Reduce the number of places that can enqueue broad syncs.
      • Simplify and harden createNewMachineOSBuildOrReuseExisting semantics.
      • Preserve single queue semantics and backoff, but reduce churn.
      • Keep parity with user-visible behavior (no confusing logs).

      Scope & Approach

      1. Collapse & Localize Sync Loops
        • Remove or narrow syncAll usages.
        • Restrict syncMachineOSConfigs/Builds/Pools to targeted callers.
        • Ensure every event handler calls one idempotent, object-scoped reconcile.
      1. Clarify MOSB Creation/Re-use
        • Make createNewMachineOSBuildOrReuseExisting pure & explicit:
          • Separate paths for: pool-change rebuild, annotation rebuild, missing-image rebuild, and “no-op/reuse”.
          • Document and enforce transitions (state machine + unit tests).
        • Keep the “delete old MOSB before recreate” only for annotation rebuilds (fixes the MCP degrade/regression).
      1. Event→Action Map
        • For each informer (MOSC, MOSB, Job, MCP), define precise actions:
          • What fields trigger work?
          • What sync (if any) is allowed?
          • Effectively log
      1. Developer Split Controller/Develop backup reconciler in tandem 
        • Prototype a second controller (e.g., MCP→MOSB builder trigger)/ build a replacement reconciler to replace the first reconciler when testing is green
        • Keep a single queue per controller (if we go the second controller route); ensure ownership boundaries are clean.

      Acceptance Criteria

      • Controlled use of syncAll and other syncs from event handlers.
      • createNewMachineOSBuildOrReuseExisting has table-driven tests covering:
        • pool-change (new rendered-MC) → create new MOSB (no pre-delete)
        • annotation rebuild → delete old MOSB then recreate with same name
        • missing-image rebuild → delete old MOSB (if needed) then recreate
        • no-op path → reuse existing, no new MOSB or Job
      • All OCL e2e tests consistently pass in CI without manual sleeps/timeouts increases.

              Unassigned Unassigned
              dkhater@redhat.com Dalia Khater
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: