Loading...

Type: Bug
Resolution: Won't Do
Priority: Major
Fix Version/s: None
Affects Version/s: 4.9
Component/s: Cluster Version Operator
Labels:
- groomed

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
5
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:

4.9.z
Release Blocker:
Rejected
Sprint:
OTA 223, OTA 224
sprint_count:
2

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:
After trying to upgrade to an unavailable payload(no upgrade happens as expected), cvo can not continue to start a new upgrade even with a correct payload repo.

=======================================
Check cvo log to find cvo struggling for the update job version--v5f88 and fail due to timeout. But it did not respond to the new upgrade requirement after that.

./oc -n openshift-cluster-version logs cluster-version-operator-68ccb8c4fd-p7x4r|grep 'quay.io/openshift-release-dev/ocp-release@sha256\:90fabdb'|head -n1
I0310 04:52:15.072040 1 cvo.go:546] Desired version from spec is v1.Update{Version:"", Image:"quay.io/openshift-release-dev/ocp-release@sha256:90fabdb570eb248f93472cc06ef28d09d5820e80b9ed578e2484f4ef526fe6d4", Force:false}

./oc -n openshift-cluster-version logs cluster-version-operator-68ccb8c4fd-p7x4r|grep 'registry.ci.openshift.org/ocp/release@sha256\:90fabdb'|head -n1
#

...
0310 04:52:15.072040 1 cvo.go:546] Desired version from spec is v1.Update{Version:"", Image:"quay.io/openshift-release-dev/ocp-release@sha256:90fabdb570eb248f93472cc06ef28d09d5820e80b9ed578e2484f4ef526fe6d4", Force:false}
...
I0310 04:52:15.225739 1 batch.go:53] No active pods for job version--v5f88 in namespace openshift-cluster-version
I0310 04:52:15.225778 1 batch.go:22] Job version--v5f88 in namespace openshift-cluster-version is not ready, continuing to wait.
...
I0310 05:03:12.238308 1 batch.go:53] No active pods for job version--v5f88 in namespace openshift-cluster-version
E0310 05:03:12.238525 1 batch.go:19] deadline exceeded, reason: "DeadlineExceeded", message: "Job was active longer than specified deadline"
.....

./oc get all -n openshift-cluster-version
NAME READY STATUS RESTARTS AGE
pod/cluster-version-operator-68ccb8c4fd-p7x4r 1/1 Running 0 61m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/cluster-version-operator ClusterIP 172.30.220.176 <none> 9099/TCP 62m

NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/cluster-version-operator 1/1 1 1 61m

NAME DESIRED CURRENT READY AGE
replicaset.apps/cluster-version-operator-68ccb8c4fd 1 1 1 61m

NAME COMPLETIONS DURATION AGE
job.batch/version--v5f88 0/1 30m 30m

Version-Release number of the following components:
4.11.0-0.nightly-2022-03-04-063157

How reproducible:
always

Steps to Reproduce:
1. Trigger an upgrade to an unavailable image(by mistake), from 4.11.0-0.nightly-2022-03-04-063157 to 4.11.0-0.nightly-2022-03-08-191358

#./oc adm upgrade --to-image quay.io/openshift-release-dev/ocp-release@sha256:90fabdb570eb248f93472cc06ef28d09d5820e80b9ed578e2484f4ef526fe6d4 --allow-explicit-upgrade
warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
Updating to release image quay.io/openshift-release-dev/ocp-release@sha256:90fabdb570eb248f93472cc06ef28d09d5820e80b9ed578e2484f4ef526fe6d4

2. Wait for several mins(>5mins), no upgrade will happen(expected), and no any failure info(not expected)

./oc get clusterversion -ojson|jq .items[].status.conditions
{
"lastTransitionTime": "2022-03-10T04:20:12Z",
"message": "Payload loaded version=\"4.11.0-0.nightly-2022-03-04-063157\" image=\"registry.ci.openshift.org/ocp/release@sha256:cdeb8497920d9231ecc1ea7535e056b192f2ccf0fa6257d65be3bb876c1b9de6\"",
"reason": "PayloadLoaded",
"status": "True",
"type": "ReleaseAccepted"
},
...
./oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.11.0-0.nightly-2022-03-04-063157 True False 27m Cluster version is 4.11.0-0.nightly-2022-03-04-063157

./oc adm upgrade
Cluster version is 4.11.0-0.nightly-2022-03-04-063157

Upstream is unset, so the cluster will use an appropriate default.
Channel: stable-4.11
warning: Cannot display available updates:
Reason: VersionNotFound
Message: Unable to retrieve available updates: currently reconciling cluster version 4.11.0-0.nightly-2022-03-04-063157 not found in the "stable-4.11" channel

3. Continue upgrade to target payload with correct repo

./oc adm upgrade --to-image registry.ci.openshift.org/ocp/release@sha256:90fabdb570eb248f93472cc06ef28d09d5820e80b9ed578e2484f4ef526fe6d4 --allow-explicit-upgrade
warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
Updating to release image registry.ci.openshift.org/ocp/release@sha256:90fabdb570eb248f93472cc06ef28d09d5820e80b9ed578e2484f4ef526fe6d4

4. Still no upgrade happen, the same with step 2(not expected)

Actual results:
An update to available payload will bring cvo does not work.

Expected results:
Upgrade to correct target payload should be triggerred.

Additional info:
`oc adm upgrade --clear` to cancel the initial invalid upgrade before triggering new upgrade does not work. Only delete cvo pod to get it re-deployed, then cvo will work again.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

cvo.log
6.46 MB
2022/08/22 4:02 AM

clones

OCPBUGS-229 CVO does not trigger new upgrade again after fail to update to unavailable payload

Closed

links to

openshift/cluster-version-operator#823: OCPBUGS-230: lib/resourcebuilder/batch: Stop waiting on Job deadline exceeded

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide