Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-2437

Clusters with large numbers of CSVs can cause crashloop, block upgrades

    XMLWordPrintable

Details

    • Moderate
    • CNF Compute Sprint 226, CNF Compute Sprint 227
    • 2
    • Proposed
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      A cluster began upgrading from 4.10.28 to 4.11.5, but the control plane upgrade has stalled during upgrade of the node-tuning-operator.
      
      The reason is that the node-tuning-operator is in a crashloop, and this appears to be because it is trying to get a list of CSVs, and this takes too long, causing a timeout:
      
      F1016 23:19:57.077167       1 main.go:130] unable to remove Performance addons OLM operator: the server was unable to return a response in the time allotted, but may still be processing the request (get clu
      sterserviceversions.operators.coreos.com)                                   goroutine 1 [running]:
      
      The cluster has a large number of CSVs installed (29000) - it has ~1500 namespaces and each namespace has ~20 CSVs in it (a number of operators installed that will install CSVs in all namespaces)
      
      The cluster is currently stuck because the CSV List never completes in time.

      Version-Release number of selected component (if applicable):

      4.10.28 (upgrading from)
      4.11.5 (upgrading to)

      Attachments

        Issue Links

          Activity

            People

              yquinn@redhat.com Yanir Quinn
              mbargenq Matt Bargenquast (Inactive)
              Mallapadi Niranjan Mallapadi Niranjan
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: