-
Bug
-
Resolution: Done
-
Undefined
-
None
-
4.17
-
Important
-
None
-
Rejected
-
False
-
-
Description of problem:
Two managedcluster SNOs in the same IBGU. Using the action plan ['Prep'], ['AbortOnFailure'], ['Upgrade'], ['AbortOnFailure'], ['FinalizeUpgrade'], When one SNO fails the prep phase the other SNO does not progress past prep phase.
Version-Release number of selected component (if applicable):
TALM 4.17 LCA 4.17
How reproducible:
Always
Steps to Reproduce:
1.Provision Hub Cluster with OCP 4.16, TALM 4.17, GitOps 2.Provision two managed clusters running OCP 4.17-ec.1, LCA 4.17 3. Create valid seed image from SNO running on identical hardware running OCP 4.17.ec-2 4. Create IBGU with action plan ['Prep'], ['AbortOnFailure'], ['Upgrade'], ['AbortOnFailure'], ['FinalizeUpgrade'], 5. Disable one of the two SNO managed clusters so that prep phase fails 6. Create IBGU CR and observe IBGU, CGUs on hub, and IBU on spoke
Actual results:
-Disabled spoke cluster reports prep phase failed and proceeds to phase AbortOnFailure as expected -Running spoke cluster completes prep phase. Does not proceed to Upgrade or FinalizeUpgrade phases. However, CGUs are created for these phases and report completed.
Expected results:
- Disabled spoke cluster aborts due to prep phase failure - Running spoke cluster completes upgrade
Additional info:
IBGU: status: clusters: - failedActions: - action: Prep message: Prep stage completed successfully - action: AbortOnFailure message: Idle - action: AbortOnFailure message: Idle name: ocp-edge87 - failedActions: - action: Prep message: Prep failed - action: AbortOnFailure - action: AbortOnFailure name: ocp-edge88 conditions: - lastTransitionTime: "2024-08-23T05:13:36Z" message: All plan steps are completed reason: Completed status: "False" type: Progressing CGUs on hub: $ oc get cgu -A NAMESPACE NAME AGE STATE DETAILS default upgrade-4.17-ec2-abortonfailure-1 7h33m TimedOut Manifestwork rollout took too long default upgrade-4.17-ec2-abortonfailure-3 7h22m TimedOut Manifestwork rollout took too long default upgrade-4.17-ec2-finalizeupgrade-4 7h11m Completed All clusters already compliant with the specified managed policies default upgrade-4.17-ec2-prep-0 7h53m TimedOut Manifestwork rollout took too long default upgrade-4.17-ec2-upgrade-2 7h22m Completed All clusters already compliant with the specified managed policies