Uploaded image for project: 'Machine Config Operator'
  1. Machine Config Operator
  2. MCO-1615

Implement node degraded functionality in MCN conditions

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • None
    • None
    • 5
    • False
    • None
    • False
    • MCO Sprint 269, MCO Sprint 270
    • 0
    • 0

      Currently, the MachineConfigNode object does not clearly reflect when a node is in a degraded state. Rather, it continues showing the MCN conditions that were last updated during the failed node upgrade. An example of a node failing an upgrade in the “AppliedFilesAndOS” phase can be seen here:

       

      Name:         ip-10-0-1-244.ec2.internal
      Namespace:    
      Labels:       <none>
      Annotations:  <none>
      API Version:  machineconfiguration.openshift.io/v1alpha1
      Kind:         MachineConfigNode
      Metadata:
        Creation Timestamp:  2025-03-24T12:40:07Z
        Generation:          3
        Owner References:
          API Version:     v1
          Kind:            Node
          Name:            ip-10-0-1-244.ec2.internal
          UID:             7137be14-1e41-40a7-91ff-50da0c5693f6
        Resource Version:  91058
        UID:               9445d162-b0ea-407a-9a82-bba4db7d78db
      Spec:
        Config Version:
          Desired:  rendered-worker-49ecb3b4e784c0a32c04ded0430e5398
        Node:
          Name:  ip-10-0-1-244.ec2.internal
        Pool:
          Name:  worker
      Status:
        Conditions:
          Last Transition Time:  2025-03-24T12:40:11Z
          Message:               All pinned image sets complete
          Reason:                AsExpected
          Status:                False
          Type:                  PinnedImageSetsProgressing
          Last Transition Time:  2025-03-24T15:58:47Z
          Message:               Update is Compatible.
          Reason:                UpdateCompatible
          Status:                True
          Type:                  UpdatePrepared
          Last Transition Time:  2025-03-24T15:59:47Z
          Message:               Updating the Files and OS on disk as a part of the in progress phase
          Reason:                AppliedFilesAndOS
          Status:                Unknown
          Type:                  UpdateExecuted
          Last Transition Time:  2025-03-24T12:40:11Z
          Message:               This node has not yet entered the UpdatePostActionComplete phase
          Reason:                NotYetOccurred
          Status:                False
          Type:                  UpdatePostActionComplete
          Last Transition Time:  2025-03-24T12:41:09Z
          Message:               Action during update to rendered-worker-cb3673914e9994a198f0a92079c46ffc: Uncordoned Node as part of completing upgrade phase
          Reason:                Uncordoned
          Status:                False
          Type:                  UpdateComplete
          Last Transition Time:  2025-03-24T12:41:09Z
          Message:               Action during update to rendered-worker-cb3673914e9994a198f0a92079c46ffc: In desired config . Resumed normal operations.
          Reason:                Resumed
          Status:                False
          Type:                  Resumed
          Last Transition Time:  2025-03-24T15:58:47Z
          Message:               Update Compatible. Post Cfg Actions []: Drain Required: true
          Reason:                UpdatePreparedUpdateCompatible
          Status:                True
          Type:                  UpdateCompatible
          Last Transition Time:  2025-03-24T15:59:45Z
          Message:               Drained node. The drain is complete as the desired drainer matches current drainer: drain-rendered-worker-49ecb3b4e784c0a32c04ded0430e5398
          Reason:                UpdateExecutedDrained
          Status:                True
          Type:                  Drained
          Last Transition Time:  2025-03-24T15:59:47Z
          Message:               Applying files and new OS config to node. OS will not need an update. SSH Keys will not need an update
          Reason:                UpdateExecutedAppliedFilesAndOS
          Status:                Unknown
          Type:                  AppliedFilesAndOS
          Last Transition Time:  2025-03-24T15:58:52Z
          Message:               Cordoned node. The node is reporting Unschedulable = true
          Reason:                UpdateExecutedCordoned
          Status:                True
          Type:                  Cordoned
          Last Transition Time:  2025-03-24T12:40:11Z
          Message:               This node has not yet entered the RebootedNode phase
          Reason:                NotYetOccurred
          Status:                False
          Type:                  RebootedNode
          Last Transition Time:  2025-03-24T12:40:11Z
          Message:               This node has not yet entered the ReloadedCRIO phase
          Reason:                NotYetOccurred
          Status:                False
          Type:                  ReloadedCRIO
          Last Transition Time:  2025-03-24T15:58:46Z
          Message:               Node ip-10-0-1-244.ec2.internal needs an update
          Reason:                Updated
          Status:                False
          Type:                  Updated
          Last Transition Time:  2025-03-24T12:41:09Z
          Message:               Action during update to rendered-worker-cb3673914e9994a198f0a92079c46ffc: UnCordoned node. The node is reporting Unschedulable = false
          Reason:                UpdateCompleteUncordoned
          Status:                False
          Type:                  Uncordoned
          Last Transition Time:  2025-03-24T12:40:11Z
          Message:               All is good
          Reason:                AsExpected
          Status:                False
          Type:                  PinnedImageSetsDegraded
        Config Version:
          Current:            rendered-worker-cb3673914e9994a198f0a92079c46ffc
          Desired:            rendered-worker-49ecb3b4e784c0a32c04ded0430e5398
        Observed Generation:  4
      Events:                 <none>
      

      As can be seen, this does not clearly show that something went wrong with the update. Instead it looks like the upgrade is still proceeding. This impacts our ability to use MCN to power other functionality and will likely lead to customer confusion.

      A `MachineConfigNodeNodeDegraded` status condition was added as part of the MCN API updates in MCO-1543. This story involves implementing the functionality to populate this condition on a node degrade. Some bugs, including OCPBUGS-44290 and OCPBUGS-52828, have been opened due to issues resulting from MCN not reporting node degradation clearly. Further, MCN clearly reporting on node degraded is needed to power the functionality for MCO-1228, which is part of the status reporting GA.

      Done when:

      • MCN clearly reports node degrade statuses using the `MachineConfigNodeNodeDegraded` condition

              rh-ee-pabrodri Pablo Rodriguez Nava
              rh-ee-ijanssen Isabella Janssen
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: