-
Bug
-
Resolution: Duplicate
-
Undefined
-
None
-
4.12
-
None
-
No
-
False
-
Description of problem:
A cluster was upgrading from 4.12.9 toward 4.12.17 but the upgrade was stuck during the upgrade of the node-tuning-operator. The reason is that the node-tuning-operator is in a crashloop, and this appears to be because it is trying to get a list of CSVs, and this takes too long, causing a timeout: $ oc get pods -n openshift-cluster-node-tuning-operator NAME READY STATUS RESTARTS AGE cluster-node-tuning-operator-7879747fc9-x8fqf 0/1 CrashLoopBackOff 33 (3m40s ago) 3h7m $ oc logs -f cluster-node-tuning-operator-7879747fc9-x8fqf -n openshift-cluster-node-tuning-operator ... unable to remove Performance addons OLM operator: the server was unable to return a response in the time allotted, but may still be processing the request (get clusterserviceversions.operators.coreos.com) ... The cluster has 114 CSVs
Version-Release number of selected component (if applicable):
4.12.9 (upgrading from) 4.12.17 (upgrading to)
How reproducible:
Reproducible during upgrade
Actual results:
Upgrade stays stuck
Expected results:
Upgrade should not be stuck
Affected clusters:
- 8a263e14-10dc-4a52-925b-2d53054945a2
- 41f05d14-82c6-4cf9-852c-f45853bbfa1e
Associated alerts:
- https://redhat.pagerduty.com/incidents/Q3N6S6B66DEW0S
- https://redhat.pagerduty.com/incidents/Q287KVSRKXLBDK
Slack thread: https://redhat-internal.slack.com/archives/CCX9DB894/p1685168355036519
- duplicates
-
OCPBUGS-14241 Clusters with large numbers of CSVs can CrashLoop the NTO and block upgrades
- Closed
- is related to
-
OCPBUGS-2437 Clusters with large numbers of CSVs can cause crashloop, block upgrades
- Closed