Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-8714

Hardening the PinnedImageSet (PIS) Lifecycle and Health Contract

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • MCO
    • None
    • None
    • Product / Portfolio Work
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      1. Proposed title of this feature request

      Hardening the PinnedImageSet (PIS) Lifecycle and Health Contract

      2. What is the nature and description of the request?

      Right now, PIS runs in the background and doesn't really 'talk' to the rest of the MCO. If it fails, the cluster often stays green even though it's technically broken. We need to integrate PIS into the MCO health model so that its status actually affects the Pool's health and provides a predictable way to see when things are failing. We need to move away from the current ambiguous state in which PIS failures don't always accurately signal cluster health.

      • Health Contract: Define exactly when a PIS failure should degrade a Pool (MCP) vs. when it should just stay as a Node-level (MCN) warning.
      • State Management: Fix the "stale error" issue by ensuring the MCN reconciles each PIS independently so that fixing one PIS actually clears its specific error.
      • Standardized Reporting: Update PIS conditions to include the "rendered-config" versioning used elsewhere in MCO, so it's clear which update a status message belongs to.
      • Validation Logic: Relax the strict SHA-only format requirements to better support real-world registry and disconnected use cases (as requested in RFE-8447).

      3. Why does the customer need this? (List the business requirements here)

      • End "Silent Failures": Prevent cases where an MCP looks healthy, but nodes are actually failing to pull/pin critical images needed for a successful upgrade.
      • Improve Supportability: Reduce SRE/Support toil by providing clear, versioned status messages that persist only until a configuration is corrected.
      • Predictable Upgrades: Ensure that PIS behavior is deterministic across the cluster—especially in disconnected environments where image availability is high-stakes.
      • Logical Consistency: Align PIS behavior with the rest of the MCO framework to reduce the learning curve for admins.

      4. List any affected packages or components.

      • Machine Config Operator (MCO)
      • Machine Config Controller (MCC)
      • Machine Config Daemon (MCD)

              rhn-support-mrussell Mark Russell
              dkhater@redhat.com Dalia Khater
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                None
                None