Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-6851

admin ack test nondeterministically does a check post-upgrade


    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • 4.11.z
    • None
    • Moderate
    • 1
    • OTA 231
    • 1
    • False
    • Hide



      This is a clone of issue OCPBUGS-6850. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-6503. The following is the description of the original issue:

      Description of problem:

      While looking into OCPBUGS-5505 I discovered that some 4.10->4.11 upgrade job runs perform an Admin Ack check, while some do not. 4.11 has a ack-4.11-kube-1.25-api-removals-in-4.12 gate, so these upgrade jobs sometimes test that Upgradeable goes false after the ugprade, and sometimes they do not. This is only determined by the polling race condition: the check is executed once per 10 minutes, and we cancel the polling after upgrade is completed. This means that in some cases we are lucky and manage to run one check before the cancel, and sometimes we are not and only check while still on the base version.

      Example job that checked admin acks post-upgrade:

      $ curl --silent https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/openshift-cluster-version-operator-880-ci-4.11-upgrade-from-stable-4.10-e2e-azure-upgrade/1611444032104304640/artifacts/e2e-azure-upgrade/openshift-e2e-test/artifacts/e2e.log | grep 'Waiting for Upgradeable to be AdminAckRequired'
      Jan  6 21:16:40.153: INFO: Waiting for Upgradeable to be AdminAckRequired ...

      Example job that did not check admin acks post-upgrade:

      $ curl --silent https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/openshift-cluster-version-operator-880-ci-4.11-upgrade-from-stable-4.10-e2e-azure-upgrade/1611444033509396480/artifacts/e2e-azure-upgrade/openshift-e2e-test/artifacts/e2e.log | grep 'Waiting for Upgradeable to be AdminAckRequired'

      Version-Release number of selected component (if applicable):

      4.11+ openshift-tests

      How reproducible:

      nondeterministic, wild guess is ~30% of upgrade jobs

      Steps to Reproduce:

      1. Inspect the E2E test log of an upgrade jobs and compare the time of the update ("Completed upgrade") with the time of the last check ( "Skipping admin ack", "Gate .* not applicable to current version", "Admin Ack verified') done by the admin ack test

      Actual results:

      Jan 23 00:47:43.842: INFO: Admin Ack verified
      Jan 23 00:57:43.836: INFO: Admin Ack verified
      Jan 23 01:07:43.839: INFO: Admin Ack verified
      Jan 23 01:17:33.474: INFO: Completed upgrade to registry.build01.ci.openshift.org/ci-op-z09ll8fw/release@sha256:322cf67dc00dd6fa4fdd25c3530e4e75800f6306bd86c4ad1418c92770d58ab8

      No check done after the upgrade

      Expected results:

      Jan 23 00:57:37.894: INFO: Admin Ack verified
      Jan 23 01:07:37.894: INFO: Admin Ack verified
      Jan 23 01:16:43.618: INFO: Completed upgrade to registry.build01.ci.openshift.org/ci-op-z8h5x1c5/release@sha256:9c4c732a0b4c2ae887c73b35685e52146518e5d2b06726465d99e6a83ccfee8d
      Jan 23 01:17:57.937: INFO: Admin Ack verified

      One or more checks done after upgrade

            afri@afri.cz Petr Muller
            openshift-crt-jira-prow OpenShift Prow Bot
            Evgeni Vakhonin Evgeni Vakhonin
            0 Vote for this issue
            7 Start watching this issue