-
Feature Request
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
None
-
Product / Portfolio Work
-
None
-
False
-
-
None
-
None
-
None
-
-
None
-
None
-
None
-
None
-
None
1. Proposed title of this feature request
Hardening the PinnedImageSet (PIS) Lifecycle and Health Contract
2. What is the nature and description of the request?
Right now, PIS runs in the background and doesn't really 'talk' to the rest of the MCO. If it fails, the cluster often stays green even though it's technically broken. We need to integrate PIS into the MCO health model so that its status actually affects the Pool's health and provides a predictable way to see when things are failing. We need to move away from the current ambiguous state in which PIS failures don't always accurately signal cluster health.
- Health Contract: Define exactly when a PIS failure should degrade a Pool (MCP) vs. when it should just stay as a Node-level (MCN) warning.
- State Management: Fix the "stale error" issue by ensuring the MCN reconciles each PIS independently so that fixing one PIS actually clears its specific error.
- Standardized Reporting: Update PIS conditions to include the "rendered-config" versioning used elsewhere in MCO, so it's clear which update a status message belongs to.
- Validation Logic: Relax the strict SHA-only format requirements to better support real-world registry and disconnected use cases (as requested in RFE-8447).
3. Why does the customer need this? (List the business requirements here)
- End "Silent Failures": Prevent cases where an MCP looks healthy, but nodes are actually failing to pull/pin critical images needed for a successful upgrade.
- Improve Supportability: Reduce SRE/Support toil by providing clear, versioned status messages that persist only until a configuration is corrected.
- Predictable Upgrades: Ensure that PIS behavior is deterministic across the cluster—especially in disconnected environments where image availability is high-stakes.
- Logical Consistency: Align PIS behavior with the rest of the MCO framework to reduce the learning curve for admins.
4. List any affected packages or components.
- Machine Config Operator (MCO)
- Machine Config Controller (MCC)
- Machine Config Daemon (MCD)
- is related to
-
OCPBUGS-32745 2 minutes time out degrades machineconfignode resources when pinning release images
-
- New
-
-
OCPBUGS-50878 PIS error in MCN not clearing on PIS correction
-
- New
-
-
OCPBUGS-57177 Applying an invalid PIS should degrade an MCP
-
- New
-
-
OCPBUGS-66210 Origin tests transient failures: Invalid PIS should correctly fail
-
- New
-
-
OCPBUGS-54531 PIS conditions in MCN are not clear about which update they refer to
-
- New
-