-
Bug
-
Resolution: Done
-
Normal
-
4.11.z
-
None
-
Moderate
-
None
-
1
-
OTA 231
-
1
-
False
-
This is a clone of issue OCPBUGS-6503. The following is the description of the original issue:
—
Description of problem:
While looking into OCPBUGS-5505 I discovered that some 4.10->4.11 upgrade job runs perform an Admin Ack check, while some do not. 4.11 has a ack-4.11-kube-1.25-api-removals-in-4.12 gate, so these upgrade jobs sometimes test that Upgradeable goes false after the ugprade, and sometimes they do not. This is only determined by the polling race condition: the check is executed once per 10 minutes, and we cancel the polling after upgrade is completed. This means that in some cases we are lucky and manage to run one check before the cancel, and sometimes we are not and only check while still on the base version.
Example job that checked admin acks post-upgrade:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/openshift-cluster-version-operator-880-ci-4.11-upgrade-from-stable-4.10-e2e-azure-upgrade/1611444032104304640
$ curl --silent https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/openshift-cluster-version-operator-880-ci-4.11-upgrade-from-stable-4.10-e2e-azure-upgrade/1611444032104304640/artifacts/e2e-azure-upgrade/openshift-e2e-test/artifacts/e2e.log | grep 'Waiting for Upgradeable to be AdminAckRequired' Jan 6 21:16:40.153: INFO: Waiting for Upgradeable to be AdminAckRequired ...
Example job that did not check admin acks post-upgrade:
https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/openshift-cluster-version-operator-880-ci-4.11-upgrade-from-stable-4.10-e2e-azure-upgrade/1611444033509396480
$ curl --silent https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/openshift-cluster-version-operator-880-ci-4.11-upgrade-from-stable-4.10-e2e-azure-upgrade/1611444033509396480/artifacts/e2e-azure-upgrade/openshift-e2e-test/artifacts/e2e.log | grep 'Waiting for Upgradeable to be AdminAckRequired'
Version-Release number of selected component (if applicable):
4.11+ openshift-tests
How reproducible:
nondeterministic, wild guess is ~30% of upgrade jobs
Steps to Reproduce:
1. Inspect the E2E test log of an upgrade jobs and compare the time of the update ("Completed upgrade") with the time of the last check ( "Skipping admin ack", "Gate .* not applicable to current version", "Admin Ack verified') done by the admin ack test
Actual results:
Jan 23 00:47:43.842: INFO: Admin Ack verified Jan 23 00:57:43.836: INFO: Admin Ack verified Jan 23 01:07:43.839: INFO: Admin Ack verified Jan 23 01:17:33.474: INFO: Completed upgrade to registry.build01.ci.openshift.org/ci-op-z09ll8fw/release@sha256:322cf67dc00dd6fa4fdd25c3530e4e75800f6306bd86c4ad1418c92770d58ab8
No check done after the upgrade
Expected results:
Jan 23 00:57:37.894: INFO: Admin Ack verified Jan 23 01:07:37.894: INFO: Admin Ack verified Jan 23 01:16:43.618: INFO: Completed upgrade to registry.build01.ci.openshift.org/ci-op-z8h5x1c5/release@sha256:9c4c732a0b4c2ae887c73b35685e52146518e5d2b06726465d99e6a83ccfee8d Jan 23 01:17:57.937: INFO: Admin Ack verified
One or more checks done after upgrade
- blocks
-
OCPBUGS-6851 admin ack test nondeterministically does a check post-upgrade
- Closed
- clones
-
OCPBUGS-6503 admin ack test nondeterministically does a check post-upgrade
- Closed
- is blocked by
-
OCPBUGS-6503 admin ack test nondeterministically does a check post-upgrade
- Closed
- is cloned by
-
OCPBUGS-6851 admin ack test nondeterministically does a check post-upgrade
- Closed