Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.12.z
Component/s: TALM Operator
Labels:
- perfscale-telco-5g
- telco-5g

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:

4.13.z
Release Blocker:
None
Sprint:
CNF RAN Sprint 231, CNF RAN Sprint 232, CNF RAN Sprint 233, CNF RAN Sprint 234, CNF RAN Sprint 235, CNF RAN Sprint 236, CNF RAN Sprint 237, CNF RAN Sprint 238, CNF RAN Sprint 239, CNF RAN Sprint 240, CNF RAN Sprint 241, CNF RAN Sprint 242, CNF RAN Sprint 243, CNF RAN Sprint 244
sprint_count:
14

Internal Whiteboard:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

While attempted to upgrade 3465 SNOs, 1 SNO became unreachable because its api was down for an extended period of time which held back the entire fleet of SNOs from upgrading (with precache).  It seems the other SNOs in the CGU did complete their precaching jobs by looking at when the clusters pulled the precaching image.  Once the apidown sno was resolved (kubelet restarted) the cgu continued into backup and then upgrade portions. The original unreachable SNO could not upgrade because it was missing the admin-acks because this was applied in mass via oc cli commands. (And the unreachable cluster wasn't noticed at the time.

Version-Release number of selected component (if applicable):

ACM - 2.7.0-DOWNSTREAM-2023-01-26-20-15-10 RC4
HUB OCP 4.12.1
SNO OCP 4.11.24 upgrading to 4.12.1

How reproducible:

Steps to Reproduce:

1.
2.
3.

Actual results:

Expected results:

Additional info:

This is similar to https://issues.redhat.com/browse/OCPBUGS-2601 however instead of holding back the clusters from precaching, the unreachable SNO held back the clusters from upgrading.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

0.log.gz
2.69 MB
2023/02/07 4:06 PM
0.log.20230205-170137.gz
2.66 MB
2023/02/07 4:06 PM
0.log.20230205-155315.gz
2.72 MB
2023/02/07 4:06 PM
0.log.20230205-144705.gz
2.70 MB
2023/02/07 4:06 PM

blocks

OCPBUGS-7347 TALM did not allow CGU to start upgrade until an apidown SNO was reachable

Closed

is cloned by

OCPBUGS-7347 TALM did not allow CGU to start upgrade until an apidown SNO was reachable

Closed

links to

openshift-kni/cluster-group-upgrades-operator#449: OCPBUGS-7104: Detect MCV processing error

RHSA-2023:6257 OpenShift Container Platform 4.13.z security update

mentioned on

Merge request - Updated US source to: 782c978 Detect MCV processing error (#449)

Assignee:: Jun Chen

Reporter:: Alex Krzos

Need Info From:: None

Contributors:: None

QA Contact:: Alex Krzos

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2023/02/06 8:00 PM

Updated:: 2025/07/28 5:36 AM

Resolved:: 2024/10/31 5:19 PM

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates