-
Bug
-
Resolution: Done
-
Undefined
-
None
-
4.12.z
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
CNF RAN Sprint 231, CNF RAN Sprint 232, CNF RAN Sprint 233, CNF RAN Sprint 234, CNF RAN Sprint 235, CNF RAN Sprint 236, CNF RAN Sprint 237, CNF RAN Sprint 238, CNF RAN Sprint 239, CNF RAN Sprint 240, CNF RAN Sprint 241, CNF RAN Sprint 242, CNF RAN Sprint 243, CNF RAN Sprint 244
-
14
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
While attempted to upgrade 3465 SNOs, 1 SNO became unreachable because its api was down for an extended period of time which held back the entire fleet of SNOs from upgrading (with precache). It seems the other SNOs in the CGU did complete their precaching jobs by looking at when the clusters pulled the precaching image. Once the apidown sno was resolved (kubelet restarted) the cgu continued into backup and then upgrade portions. The original unreachable SNO could not upgrade because it was missing the admin-acks because this was applied in mass via oc cli commands. (And the unreachable cluster wasn't noticed at the time.
Version-Release number of selected component (if applicable):
ACM - 2.7.0-DOWNSTREAM-2023-01-26-20-15-10 RC4 HUB OCP 4.12.1 SNO OCP 4.11.24 upgrading to 4.12.1
How reproducible:
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info:
This is similar to https://issues.redhat.com/browse/OCPBUGS-2601 however instead of holding back the clusters from precaching, the unreachable SNO held back the clusters from upgrading.
- blocks
-
OCPBUGS-7347 TALM did not allow CGU to start upgrade until an apidown SNO was reachable
-
- Closed
-
- is cloned by
-
OCPBUGS-7347 TALM did not allow CGU to start upgrade until an apidown SNO was reachable
-
- Closed
-
- links to
-
RHSA-2023:6257
OpenShift Container Platform 4.13.z security update
- mentioned on