Uploaded image for project: 'FlightPath'
  1. FlightPath
  2. FLPATH-3291

Koku should protect CMMO 4.2.0 from source creation race condition (FLPATH-3064)

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • insights-on-prem
    • False
    • Hide

      None

      Show
      None
    • False

      Description

      CMMO 4.2.0 is vulnerable to the source creation race condition described in FLPATH-3064. When a new cluster registers, koku's source creation POST takes >30s, causing the CMMO HTTP client (30s timeout) to time out. CMMO then proceeds to upload in the same reconciliation cycle without confirming source creation succeeded. The koku listener receives the upload before the provider is committed, treats it as an "unexpected OCP report", and silently discards the payload. This data is permanently lost – the operator does not retry until the next upload cycle (default 6 hours).

      CMMO 4.3.0 is protected from this race condition because it introduced a code change requiring a source to be defined before accepting optimization reports (the fix for FLPATH-2934). CMMO 4.2.0 does not have this client-side protection.

      Since modifying CMMO 4.2.0 is not an option (see FLPATH-3064: "We don't want to change CMMO code"), a server-side fix in koku is needed to protect CMMO 4.2.0 from this race condition.

      Problem

      • CMMO 4.2.0 uploads data immediately after attempting source creation, even if the source creation timed out
      • Koku listener discards the upload because the provider is not yet committed
      • First upload data is permanently lost; next retry is 6 hours later
      • CMMO 4.3.0 is not affected due to its client-side source validation guard

      Proposed Fix Direction

      Implement a koku-side mechanism analogous to the CMMO 4.3.0 client-side protection. Possible approaches:

      • Queue and retry: When the koku listener receives an upload for an unknown cluster, queue the payload and retry processing after source creation completes, rather than discarding it
      • Hold and wait: If source creation is in progress for a given cluster ID, have the listener wait for completion before processing the upload
      • Graceful rejection: Return an HTTP error code (e.g. 503 Retry-After) instead of 202, signaling CMMO to retry sooner than the default 6-hour cycle

      Related Issues

      • FLPATH-3064 – CMMO gets a timeout when it attempts to create a new source (root cause of the race condition)
      • FLPATH-2934 – CMMO 4.3.0 CSVs not processed by Insights On-Prem (resolved; contains the client-side fix that protects 4.3.0)

      Version Information

      • Affected CMMO version: 4.2.0
      • Not affected: CMMO 4.3.0 (has client-side source validation guard)
      • Koku image: koku:sources (2026-02-01 build, sources integrated into koku)

              rh-ee-ehendler Elkana Hendler
              chadcrum Chad Crum
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: