-
Bug
-
Resolution: Obsolete
-
Undefined
-
None
-
4.12.z
-
No
-
False
-
-
This is a clone of issue OCPBUGS-7422. The following is the description of the original issue:
—
Description of problem:
While upgrading 3451 SNOs with the upgrade split across 4 CGUs (SNO Counts per CGU - 1000, 1000, 1000, 451) the last CGU encountered a condition where none of the 451 SNOs performed the upgrade because all showed a status of BackupTimeout. Later inspection of the SNOs revealed all actually completed the backup job. Based on the timestamps and logs it seems depending on a CGUs timeout and the schedule of when a CGU is enabled, you may hit a condition where TALM is reconciling for a lengthy period of time on a previous CGU which prevents the next CGU from performing its backup.
Version-Release number of selected component (if applicable):
ACM - 2.7.0-DOWNSTREAM-2023-01-26-20-15-10 Hub OCP 4.12.1 SNO OCP 4.11.24 upgrading to 4.12.1
How reproducible:
Depends on if the number of clusters to upgrade is large enough per CGU and when CGUs are enabled for upgrade, whether or not backup is enabled as well.
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info:
- clones
-
OCPBUGS-7422 Occasionally an entire CGU will fail to upgrade with BackupTimeout while upgrading many clusters at scale
- Closed
- is blocked by
-
OCPBUGS-7422 Occasionally an entire CGU will fail to upgrade with BackupTimeout while upgrading many clusters at scale
- Closed
- links to