Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.17.z, 4.18.z
Component/s: HyperShift
Labels:
- hcp
- hypershift
- nodepool

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

The inplaceupgrader controller in the HyperShift Configuration Operator (HCCO) logs an error when it encounters a degraded node during an upgrade, but it does not specify which node is degraded. Additionally, the NodePool CR status remains "Healthy" (e.g., AllNodesHealthy: True) even when the reconciler is blocked by a degraded node, leading to a "silent" upgrade failure from a user's perspective.

Version-Release number of selected component (if applicable):

OCP management Version: 4.18.27
ACM version: 2.13.5
Platform: Hosted Control Planes (HCP) Agent
Version: OpenShift 4.17.x 
Targeting 4.17.37)
Upgrade Strategy: InPlace

How reproducible:

NA

During a customer HCP agent upgrade to 4.17.37, the upgrade stalled. The NodePool CR indicated that all nodes were updated and healthy, yet the HCCO logs showed a reconciliation error.     

HCCO Log Output:

{
  "level":"error",
  "ts":"2026-02-02T20:39:43Z",
  "msg":"Reconciler error",
  "controller":"inplaceupgrader",
  "object":{"name":"infra01-nodepool","namespace":"infra01-infra01"},
  "error":"degraded node found, cannot progress in-place upgrade. Degraded reason: disk validation failed: expected target osImageURL ... have ...",
  ...
}

Actual results:

The log error "degraded node found" does not mention the Node Name, making it extremely difficult for admins to troubleshoot clusters with high node counts (43 nodes in this case).

The NodePool status conditions (AllNodesHealthy, AllMachinesReady) reported True with the message "All is well," despite the upgrade being blocked by a validation failure.

Expected results:

The HCCO logs should explicitly state the name of the degraded node (e.g., "error": "degraded node [node-name] found...").

The NodePool status conditions should reflect that the upgrade is stalled and identify the failing node/machine in the message field.

Additional info:

    Required data will be uploaded and shared over the comment box

Assignee:: Unassigned

Reporter:: Divyam Pateriya

QA Contact:: Yu Li

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2026/02/04 9:30 AM

Updated:: 2026/02/23 3:16 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates