Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.12
Component/s: Node Tuning Operator
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
No

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

A cluster was upgrading from 4.12.9 toward 4.12.17 but the upgrade was stuck during the upgrade of the node-tuning-operator.

The reason is that the node-tuning-operator is in a crashloop, and this appears to be because it is trying to get a list of CSVs, and this takes too long, causing a timeout:

$ oc get pods -n openshift-cluster-node-tuning-operator
NAME                                            READY   STATUS             RESTARTS         AGE
cluster-node-tuning-operator-7879747fc9-x8fqf   0/1     CrashLoopBackOff   33 (3m40s ago)   3h7m

$ oc logs -f cluster-node-tuning-operator-7879747fc9-x8fqf -n openshift-cluster-node-tuning-operator
...
unable to remove Performance addons OLM operator: the server was unable to return a response in the time allotted, but may still be processing the request (get clusterserviceversions.operators.coreos.com)
...

The cluster has 114 CSVs

Version-Release number of selected component (if applicable):

4.12.9 (upgrading from)
4.12.17 (upgrading to)

How reproducible:

Reproducible during upgrade

Actual results:

Upgrade stays stuck

Expected results:

Upgrade should not be stuck

Affected clusters:

8a263e14-10dc-4a52-925b-2d53054945a2
41f05d14-82c6-4cf9-852c-f45853bbfa1e

Associated alerts:

Slack thread: https://redhat-internal.slack.com/archives/CCX9DB894/p1685168355036519

duplicates

OCPBUGS-14241 Clusters with large numbers of CSVs can CrashLoop the NTO and block upgrades

Closed

is related to

OCPBUGS-2437 Clusters with large numbers of CSVs can cause crashloop, block upgrades

Closed

Assignee:: Yanir Quinn

Reporter:: Tafhim Ul Islam

Need Info From:: None

Contributors:: None

QA Contact:: Mallapadi Niranjan

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2023/06/07 11:54 AM

Updated:: 2025/07/26 11:43 AM

Resolved:: 2023/07/25 12:00 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide