Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-18680

CGU reports SNO completed platform and operator upgrade however operators did not upgrade

    XMLWordPrintable

Details

    • No
    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      While upgrading 3555 SNOs from 4.12.29 to 4.13.9 via 4 CGU objects. Two SNOs in the completed status of the CGUs did not actually complete operator upgrade (vm00357 and vm01725).  This appears to be the case because the two clusters ran into a race condition on when the OLM reads a new catalogsource vs when the ACM policy engine reports that a policy is compliant.

      Version-Release number of selected component (if applicable):

      Hub 4.13.10
      ACM - 2.9.0-DOWNSTREAM-2023-09-01-02-58-15
      TALM - 4.13.0
      Deployed SNOs 4.12.29 upgraded to 4.13.9 (with operator upgrades)

      How reproducible:

      Rarely at scale 2 out of 3555 total upgrades, 2 out of 21 total upgrade failures.  This does account for all of the operator upgrade failures however.

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

       

      Expected results:

       

      Additional info:

       

      Attachments

        1. acm-must-gather-vm00357.tar.gz
          4.42 MB
        2. acm-must-gather-vm01725.tar.gz
          4.21 MB
        3. vm00357.catalogsources.yaml
          1 kB
        4. vm00357.policies.yaml
          87 kB
        5. vm01725.catalogsources.yaml
          1 kB
        6. vm01725.policies.yaml
          89 kB

        Issue Links

          Activity

            People

              jche@redhat.com Jun Chen
              akrzos@redhat.com Alex Krzos
              Yang Liu Yang Liu
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated: