-
Bug
-
Resolution: Done-Errata
-
Undefined
-
None
-
4.10
-
None
-
Moderate
-
None
-
False
-
Description of problem:
After trying to upgrade to an unavailable payload(no upgrade happens as expected), cvo can not continue to start a new upgrade even with a correct payload repo.
=======================================
Check cvo log to find cvo struggling for the update job version--v5f88 and fail due to timeout. But it did not respond to the new upgrade requirement after that.
- ./oc -n openshift-cluster-version logs cluster-version-operator-68ccb8c4fd-p7x4r|grep 'quay.io/openshift-release-dev/ocp-release@sha256\:90fabdb'|head -n1
I0310 04:52:15.072040 1 cvo.go:546] Desired version from spec is v1.Update{Version:"", Image:"quay.io/openshift-release-dev/ocp-release@sha256:90fabdb570eb248f93472cc06ef28d09d5820e80b9ed578e2484f4ef526fe6d4", Force:false}
- ./oc -n openshift-cluster-version logs cluster-version-operator-68ccb8c4fd-p7x4r|grep 'registry.ci.openshift.org/ocp/release@sha256\:90fabdb'|head -n1
#
...
0310 04:52:15.072040 1 cvo.go:546] Desired version from spec is v1.Update{Version:"", Image:"quay.io/openshift-release-dev/ocp-release@sha256:90fabdb570eb248f93472cc06ef28d09d5820e80b9ed578e2484f4ef526fe6d4", Force:false}
...
I0310 04:52:15.225739 1 batch.go:53] No active pods for job version--v5f88 in namespace openshift-cluster-version
I0310 04:52:15.225778 1 batch.go:22] Job version--v5f88 in namespace openshift-cluster-version is not ready, continuing to wait.
...
I0310 05:03:12.238308 1 batch.go:53] No active pods for job version--v5f88 in namespace openshift-cluster-version
E0310 05:03:12.238525 1 batch.go:19] deadline exceeded, reason: "DeadlineExceeded", message: "Job was active longer than specified deadline"
.....
- ./oc get all -n openshift-cluster-version
NAME READY STATUS RESTARTS AGE
pod/cluster-version-operator-68ccb8c4fd-p7x4r 1/1 Running 0 61m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/cluster-version-operator ClusterIP 172.30.220.176 <none> 9099/TCP 62m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/cluster-version-operator 1/1 1 1 61m
NAME DESIRED CURRENT READY AGE
replicaset.apps/cluster-version-operator-68ccb8c4fd 1 1 1 61m
NAME COMPLETIONS DURATION AGE
job.batch/version--v5f88 0/1 30m 30m
Version-Release number of the following components:
4.11.0-0.nightly-2022-03-04-063157
How reproducible:
always
Steps to Reproduce:
1. Trigger an upgrade to an unavailable image(by mistake), from 4.11.0-0.nightly-2022-03-04-063157 to 4.11.0-0.nightly-2022-03-08-191358
#./oc adm upgrade --to-image quay.io/openshift-release-dev/ocp-release@sha256:90fabdb570eb248f93472cc06ef28d09d5820e80b9ed578e2484f4ef526fe6d4 --allow-explicit-upgrade
warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
Updating to release image quay.io/openshift-release-dev/ocp-release@sha256:90fabdb570eb248f93472cc06ef28d09d5820e80b9ed578e2484f4ef526fe6d4
2. Wait for several mins(>5mins), no upgrade will happen(expected), and no any failure info(not expected)
- ./oc get clusterversion -ojson|jq .items[].status.conditions
{
"lastTransitionTime": "2022-03-10T04:20:12Z",
"message": "Payload loaded version=\"4.11.0-0.nightly-2022-03-04-063157\" image=\"registry.ci.openshift.org/ocp/release@sha256:cdeb8497920d9231ecc1ea7535e056b192f2ccf0fa6257d65be3bb876c1b9de6\"",
"reason": "PayloadLoaded",
"status": "True",
"type": "ReleaseAccepted"
},
... - ./oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.11.0-0.nightly-2022-03-04-063157 True False 27m Cluster version is 4.11.0-0.nightly-2022-03-04-063157
- ./oc adm upgrade
Cluster version is 4.11.0-0.nightly-2022-03-04-063157
Upstream is unset, so the cluster will use an appropriate default.
Channel: stable-4.11
warning: Cannot display available updates:
Reason: VersionNotFound
Message: Unable to retrieve available updates: currently reconciling cluster version 4.11.0-0.nightly-2022-03-04-063157 not found in the "stable-4.11" channel
3. Continue upgrade to target payload with correct repo
- ./oc adm upgrade --to-image registry.ci.openshift.org/ocp/release@sha256:90fabdb570eb248f93472cc06ef28d09d5820e80b9ed578e2484f4ef526fe6d4 --allow-explicit-upgrade
warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
Updating to release image registry.ci.openshift.org/ocp/release@sha256:90fabdb570eb248f93472cc06ef28d09d5820e80b9ed578e2484f4ef526fe6d4
4. Still no upgrade happen, the same with step 2(not expected)
Actual results:
An update to available payload will bring cvo does not work.
Expected results:
Upgrade to correct target payload should be triggerred.
Additional info:
`oc adm upgrade --clear` to cancel the initial invalid upgrade before triggering new upgrade does not work. Only delete cvo pod to get it re-deployed, then cvo will work again.
- is cloned by
-
OCPBUGS-230 CVO does not trigger new upgrade again after fail to update to unavailable payload
- Closed