Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-6503

admin ack test nondeterministically does a check post-upgrade

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • 4.11.z
    • None
    • Moderate
    • None
    • 1
    • OTA 231
    • 1
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      While looking into OCPBUGS-5505 I discovered that some 4.10->4.11 upgrade job runs perform an Admin Ack check, while some do not. 4.11 has a ack-4.11-kube-1.25-api-removals-in-4.12 gate, so these upgrade jobs sometimes test that Upgradeable goes false after the ugprade, and sometimes they do not. This is only determined by the polling race condition: the check is executed once per 10 minutes, and we cancel the polling after upgrade is completed. This means that in some cases we are lucky and manage to run one check before the cancel, and sometimes we are not and only check while still on the base version.

      Example job that checked admin acks post-upgrade:
      https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/openshift-cluster-version-operator-880-ci-4.11-upgrade-from-stable-4.10-e2e-azure-upgrade/1611444032104304640

      $ curl --silent https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/openshift-cluster-version-operator-880-ci-4.11-upgrade-from-stable-4.10-e2e-azure-upgrade/1611444032104304640/artifacts/e2e-azure-upgrade/openshift-e2e-test/artifacts/e2e.log | grep 'Waiting for Upgradeable to be AdminAckRequired'
      Jan  6 21:16:40.153: INFO: Waiting for Upgradeable to be AdminAckRequired ...
      

      Example job that did not check admin acks post-upgrade:
      https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/openshift-cluster-version-operator-880-ci-4.11-upgrade-from-stable-4.10-e2e-azure-upgrade/1611444033509396480

      $ curl --silent https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/openshift-cluster-version-operator-880-ci-4.11-upgrade-from-stable-4.10-e2e-azure-upgrade/1611444033509396480/artifacts/e2e-azure-upgrade/openshift-e2e-test/artifacts/e2e.log | grep 'Waiting for Upgradeable to be AdminAckRequired'
      

      Version-Release number of selected component (if applicable):

      4.11+ openshift-tests
      

      How reproducible:

      nondeterministic, wild guess is ~30% of upgrade jobs
      

      Steps to Reproduce:

      1. Inspect the E2E test log of an upgrade jobs and compare the time of the update ("Completed upgrade") with the time of the last check ( "Skipping admin ack", "Gate .* not applicable to current version", "Admin Ack verified') done by the admin ack test

      Actual results:

      Jan 23 00:47:43.842: INFO: Admin Ack verified
      Jan 23 00:57:43.836: INFO: Admin Ack verified
      Jan 23 01:07:43.839: INFO: Admin Ack verified
      Jan 23 01:17:33.474: INFO: Completed upgrade to registry.build01.ci.openshift.org/ci-op-z09ll8fw/release@sha256:322cf67dc00dd6fa4fdd25c3530e4e75800f6306bd86c4ad1418c92770d58ab8
      

      No check done after the upgrade

      Expected results:

      Jan 23 00:57:37.894: INFO: Admin Ack verified
      Jan 23 01:07:37.894: INFO: Admin Ack verified
      Jan 23 01:16:43.618: INFO: Completed upgrade to registry.build01.ci.openshift.org/ci-op-z8h5x1c5/release@sha256:9c4c732a0b4c2ae887c73b35685e52146518e5d2b06726465d99e6a83ccfee8d
      Jan 23 01:17:57.937: INFO: Admin Ack verified
      

      One or more checks done after upgrade

            afri@afri.cz Petr Muller
            afri@afri.cz Petr Muller
            Yang Yang Yang Yang
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: