-
Bug
-
Resolution: Done
-
Undefined
-
None
-
4.17
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
None
-
None
-
Rejected
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
Two managedcluster SNOs in the same IBGU. Using the action plan ['Prep'], ['AbortOnFailure'], ['Upgrade'], ['AbortOnFailure'], ['FinalizeUpgrade'], When one SNO fails the prep phase the other SNO does not progress past prep phase.
Version-Release number of selected component (if applicable):
TALM 4.17 LCA 4.17
How reproducible:
Always
Steps to Reproduce:
1.Provision Hub Cluster with OCP 4.16, TALM 4.17, GitOps 2.Provision two managed clusters running OCP 4.17-ec.1, LCA 4.17 3. Create valid seed image from SNO running on identical hardware running OCP 4.17.ec-2 4. Create IBGU with action plan ['Prep'], ['AbortOnFailure'], ['Upgrade'], ['AbortOnFailure'], ['FinalizeUpgrade'], 5. Disable one of the two SNO managed clusters so that prep phase fails 6. Create IBGU CR and observe IBGU, CGUs on hub, and IBU on spoke
Actual results:
-Disabled spoke cluster reports prep phase failed and proceeds to phase AbortOnFailure as expected -Running spoke cluster completes prep phase. Does not proceed to Upgrade or FinalizeUpgrade phases. However, CGUs are created for these phases and report completed.
Expected results:
- Disabled spoke cluster aborts due to prep phase failure - Running spoke cluster completes upgrade
Additional info:
IBGU:
status:
clusters:
- failedActions:
- action: Prep
message: Prep stage completed successfully
- action: AbortOnFailure
message: Idle
- action: AbortOnFailure
message: Idle
name: ocp-edge87
- failedActions:
- action: Prep
message: Prep failed
- action: AbortOnFailure
- action: AbortOnFailure
name: ocp-edge88
conditions:
- lastTransitionTime: "2024-08-23T05:13:36Z"
message: All plan steps are completed
reason: Completed
status: "False"
type: Progressing
CGUs on hub:
$ oc get cgu -A
NAMESPACE NAME AGE STATE DETAILS
default upgrade-4.17-ec2-abortonfailure-1 7h33m TimedOut Manifestwork rollout took too long
default upgrade-4.17-ec2-abortonfailure-3 7h22m TimedOut Manifestwork rollout took too long
default upgrade-4.17-ec2-finalizeupgrade-4 7h11m Completed All clusters already compliant with the specified managed policies
default upgrade-4.17-ec2-prep-0 7h53m TimedOut Manifestwork rollout took too long
default upgrade-4.17-ec2-upgrade-2 7h22m Completed All clusters already compliant with the specified managed policies