Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-7104

TALM did not allow CGU to start upgrade until an apidown SNO was reachable

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • CNF RAN Sprint 231, CNF RAN Sprint 232, CNF RAN Sprint 233, CNF RAN Sprint 234, CNF RAN Sprint 235, CNF RAN Sprint 236, CNF RAN Sprint 237, CNF RAN Sprint 238, CNF RAN Sprint 239, CNF RAN Sprint 240, CNF RAN Sprint 241, CNF RAN Sprint 242, CNF RAN Sprint 243, CNF RAN Sprint 244
    • 14
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      While attempted to upgrade 3465 SNOs, 1 SNO became unreachable because its api was down for an extended period of time which held back the entire fleet of SNOs from upgrading (with precache).  It seems the other SNOs in the CGU did complete their precaching jobs by looking at when the clusters pulled the precaching image.  Once the apidown sno was resolved (kubelet restarted) the cgu continued into backup and then upgrade portions. The original unreachable SNO could not upgrade because it was missing the admin-acks because this was applied in mass via oc cli commands. (And the unreachable cluster wasn't noticed at the time.

      Version-Release number of selected component (if applicable):

      ACM - 2.7.0-DOWNSTREAM-2023-01-26-20-15-10 RC4
      HUB OCP 4.12.1
      SNO OCP 4.11.24 upgrading to 4.12.1

      How reproducible:

       

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

       

      Expected results:

       

      Additional info:

      This is similar to https://issues.redhat.com/browse/OCPBUGS-2601 however instead of holding back the clusters from precaching, the unreachable SNO held back the clusters from upgrading.

        1. 0.log.gz
          2.69 MB
        2. 0.log.20230205-170137.gz
          2.66 MB
        3. 0.log.20230205-155315.gz
          2.72 MB
        4. 0.log.20230205-144705.gz
          2.70 MB

              jche@redhat.com Jun Chen
              akrzos@redhat.com Alex Krzos
              None
              None
              Alex Krzos Alex Krzos
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: