Resolution: Unresolved
Description of problem:
Even after fixing OCPBUGS-5505 and OCPBUGS-6503, the admin ack test that waits 4 minutes for the upgradeable=false after a cluster is upgraded still occasionally fails.
Example interleaved CVO and E2E logs:
I0218 01:33:14.667456 1 upgradeable.go:122] Cluster current version=4.10.52 I0218 01:33:14.678918 1 upgradeable.go:42] Upgradeable conditions were recently checked, will try later. I0218 01:33:29.668132 1 upgradeable.go:42] Upgradeable conditions were recently checked, will try later. Feb 18 01:33:36.474: INFO: Completed upgrade to registry.build05.ci.openshift.org/ci-op-inzgh27t/release@sha256:b1a4d94c1c7e2ce227135b6e3abc532cd017f0aec63dbf23e4371897cf33a1a5 1881:Feb 18 01:33:37.134: INFO: Waiting for Upgradeable to be AdminAckRequired... I0218 01:34:47.412472 1 upgradeable.go:42] Upgradeable conditions were recently checked, will try later. 1999:Feb 18 01:37:42.267: FAIL: Error while waiting for Upgradeable to complain about AdminAckRequired ... I0218 01:38:20.818466 1 upgradeable.go:122] Cluster current version=4.11.0-0.ci-2023-02-17-233449
Version-Release number of selected component (if applicable):
Latest 4.11 but we will likely see this in 4.13 jobs after OTA-899
How reproducible:
Rare; search.ci says 3-8% impact
Steps to Reproduce:
search.ci query above
Actual results:
disruption_tests: [bz-Cluster Version Operator] Verify presence of admin ack gate blocks upgrade until acknowledged expand_less 1h2m36s
{Feb 18 01:37:42.267: Error while waiting for Upgradeable to complain about AdminAckRequired with message "Kubernetes 1.25 and therefore OpenShift 4.12 remove several APIs which require admin consideration. Please see the knowledge article https://access.redhat.com/articles/6955381 for details and instructions.": timed out waiting for the condition
Expected results:
no failures
Additional info:
- The failure shows that the assumption in
OCPBUGS-6503fix that the CVO guarantees the update to happen in 4m max, needs to be reinvestigated - We can either tighten the CVO code that performs this check (OTA-860 could be a part of this), or simply relax the criteria in the check further.
- is related to
OCPBUGS-7405 [CVO] Race condition causes flakes in Admin ack gate test during upgrades
- New
OTA-860 CVO: Optimize Upgradeable check API usage and reduce throttling on its status update
- To Do
- is triggered by
OTA-899 Add OCP 4.12-to-4.13 admin ack
- Closed