-
Story
-
Resolution: Done
-
Normal
-
None
-
None
-
None
Currently, the MachineConfigNode object does not clearly reflect when a node is in a degraded state. Rather, it continues showing the MCN conditions that were last updated during the failed node upgrade. An example of a node failing an upgrade in the “AppliedFilesAndOS” phase can be seen here:
Name: ip-10-0-1-244.ec2.internal Namespace: Labels: <none> Annotations: <none> API Version: machineconfiguration.openshift.io/v1alpha1 Kind: MachineConfigNode Metadata: Creation Timestamp: 2025-03-24T12:40:07Z Generation: 3 Owner References: API Version: v1 Kind: Node Name: ip-10-0-1-244.ec2.internal UID: 7137be14-1e41-40a7-91ff-50da0c5693f6 Resource Version: 91058 UID: 9445d162-b0ea-407a-9a82-bba4db7d78db Spec: Config Version: Desired: rendered-worker-49ecb3b4e784c0a32c04ded0430e5398 Node: Name: ip-10-0-1-244.ec2.internal Pool: Name: worker Status: Conditions: Last Transition Time: 2025-03-24T12:40:11Z Message: All pinned image sets complete Reason: AsExpected Status: False Type: PinnedImageSetsProgressing Last Transition Time: 2025-03-24T15:58:47Z Message: Update is Compatible. Reason: UpdateCompatible Status: True Type: UpdatePrepared Last Transition Time: 2025-03-24T15:59:47Z Message: Updating the Files and OS on disk as a part of the in progress phase Reason: AppliedFilesAndOS Status: Unknown Type: UpdateExecuted Last Transition Time: 2025-03-24T12:40:11Z Message: This node has not yet entered the UpdatePostActionComplete phase Reason: NotYetOccurred Status: False Type: UpdatePostActionComplete Last Transition Time: 2025-03-24T12:41:09Z Message: Action during update to rendered-worker-cb3673914e9994a198f0a92079c46ffc: Uncordoned Node as part of completing upgrade phase Reason: Uncordoned Status: False Type: UpdateComplete Last Transition Time: 2025-03-24T12:41:09Z Message: Action during update to rendered-worker-cb3673914e9994a198f0a92079c46ffc: In desired config . Resumed normal operations. Reason: Resumed Status: False Type: Resumed Last Transition Time: 2025-03-24T15:58:47Z Message: Update Compatible. Post Cfg Actions []: Drain Required: true Reason: UpdatePreparedUpdateCompatible Status: True Type: UpdateCompatible Last Transition Time: 2025-03-24T15:59:45Z Message: Drained node. The drain is complete as the desired drainer matches current drainer: drain-rendered-worker-49ecb3b4e784c0a32c04ded0430e5398 Reason: UpdateExecutedDrained Status: True Type: Drained Last Transition Time: 2025-03-24T15:59:47Z Message: Applying files and new OS config to node. OS will not need an update. SSH Keys will not need an update Reason: UpdateExecutedAppliedFilesAndOS Status: Unknown Type: AppliedFilesAndOS Last Transition Time: 2025-03-24T15:58:52Z Message: Cordoned node. The node is reporting Unschedulable = true Reason: UpdateExecutedCordoned Status: True Type: Cordoned Last Transition Time: 2025-03-24T12:40:11Z Message: This node has not yet entered the RebootedNode phase Reason: NotYetOccurred Status: False Type: RebootedNode Last Transition Time: 2025-03-24T12:40:11Z Message: This node has not yet entered the ReloadedCRIO phase Reason: NotYetOccurred Status: False Type: ReloadedCRIO Last Transition Time: 2025-03-24T15:58:46Z Message: Node ip-10-0-1-244.ec2.internal needs an update Reason: Updated Status: False Type: Updated Last Transition Time: 2025-03-24T12:41:09Z Message: Action during update to rendered-worker-cb3673914e9994a198f0a92079c46ffc: UnCordoned node. The node is reporting Unschedulable = false Reason: UpdateCompleteUncordoned Status: False Type: Uncordoned Last Transition Time: 2025-03-24T12:40:11Z Message: All is good Reason: AsExpected Status: False Type: PinnedImageSetsDegraded Config Version: Current: rendered-worker-cb3673914e9994a198f0a92079c46ffc Desired: rendered-worker-49ecb3b4e784c0a32c04ded0430e5398 Observed Generation: 4 Events: <none>
As can be seen, this does not clearly show that something went wrong with the update. Instead it looks like the upgrade is still proceeding. This impacts our ability to use MCN to power other functionality and will likely lead to customer confusion.
A `MachineConfigNodeNodeDegraded` status condition was added as part of the MCN API updates in MCO-1543. This story involves implementing the functionality to populate this condition on a node degrade. Some bugs, including OCPBUGS-44290 and OCPBUGS-52828, have been opened due to issues resulting from MCN not reporting node degradation clearly. Further, MCN clearly reporting on node degraded is needed to power the functionality for MCO-1228, which is part of the status reporting GA.
Done when:
- MCN clearly reports node degrade statuses using the `MachineConfigNodeNodeDegraded` condition
- blocks
-
MCO-1623 Update MCN origin test with enhanced degrade check
-
- Closed
-
- relates to
-
OCPBUGS-44290 worker MCP is degraded in Techpreview without reporting the actual reason of the degradation
-
- Verified
-
-
OCPBUGS-52828 oc describe machineconfignodes <node_name> do not show the node degraded reason
-
- Closed
-
- links to