Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-38892

IBGU With Multiple SNOs Does Not Follow Action Plan When One SNO Fails

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Undefined Undefined
    • None
    • 4.17
    • TALM Operator
    • Important
    • None
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Two managedcluster SNOs in the same IBGU. Using the action plan ['Prep'], ['AbortOnFailure'], ['Upgrade'], ['AbortOnFailure'], ['FinalizeUpgrade'],   When one SNO fails the prep phase the other SNO does not progress past prep phase.    

      Version-Release number of selected component (if applicable):

          TALM 4.17 LCA 4.17

      How reproducible:

      Always    

      Steps to Reproduce:

      1.Provision Hub Cluster with OCP 4.16, TALM 4.17, GitOps 
      2.Provision two managed clusters running OCP 4.17-ec.1, LCA 4.17
      3. Create valid seed image from SNO running on identical hardware running OCP 4.17.ec-2
      4. Create IBGU with action plan ['Prep'], ['AbortOnFailure'], ['Upgrade'], ['AbortOnFailure'], ['FinalizeUpgrade'], 
      5. Disable one of the two SNO managed clusters so that prep phase fails
      6. Create IBGU CR and observe IBGU, CGUs on hub, and IBU on spoke     

      Actual results:

      -Disabled spoke cluster reports prep phase failed and proceeds to phase AbortOnFailure as expected
      
      -Running spoke cluster completes prep phase. Does not proceed to Upgrade or FinalizeUpgrade phases. However, CGUs are created for these phases and report completed.

      Expected results:

      - Disabled spoke cluster aborts due to prep phase failure
      - Running spoke cluster completes upgrade    

      Additional info:

          IBGU:
      status:
        clusters:
        - failedActions:
          - action: Prep
            message: Prep stage completed successfully
          - action: AbortOnFailure
            message: Idle
          - action: AbortOnFailure
            message: Idle
          name: ocp-edge87
        - failedActions:
          - action: Prep
            message: Prep failed
          - action: AbortOnFailure
          - action: AbortOnFailure
          name: ocp-edge88
        conditions:
        - lastTransitionTime: "2024-08-23T05:13:36Z"
          message: All plan steps are completed
          reason: Completed
          status: "False"
          type: Progressing
      
      CGUs on hub:
      $ oc get cgu -A
      NAMESPACE     NAME                                 AGE     STATE       DETAILS
      default       upgrade-4.17-ec2-abortonfailure-1    7h33m   TimedOut    Manifestwork rollout took too long
      default       upgrade-4.17-ec2-abortonfailure-3    7h22m   TimedOut    Manifestwork rollout took too long
      default       upgrade-4.17-ec2-finalizeupgrade-4   7h11m   Completed   All clusters already compliant with the specified managed policies
      default       upgrade-4.17-ec2-prep-0              7h53m   TimedOut    Manifestwork rollout took too long
      default       upgrade-4.17-ec2-upgrade-2           7h22m   Completed   All clusters already compliant with the specified managed policies
      

       

              saskari@redhat.com Saeid Askari
              josclark@redhat.com Joshua Clark
              Joshua Clark Joshua Clark
              Joshua Clark
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: