Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-229

CVO does not trigger new upgrade again after fail to update to unavailable payload

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Undefined Undefined
    • None
    • 4.10
    • None
    • Moderate
    • None
    • False
    • Hide

      None

      Show
      None

      Description of problem:
      After trying to upgrade to an unavailable payload(no upgrade happens as expected), cvo can not continue to start a new upgrade even with a correct payload repo.

      =======================================
      Check cvo log to find cvo struggling for the update job version--v5f88 and fail due to timeout. But it did not respond to the new upgrade requirement after that.

      1. ./oc -n openshift-cluster-version logs cluster-version-operator-68ccb8c4fd-p7x4r|grep 'quay.io/openshift-release-dev/ocp-release@sha256\:90fabdb'|head -n1
        I0310 04:52:15.072040 1 cvo.go:546] Desired version from spec is v1.Update{Version:"", Image:"quay.io/openshift-release-dev/ocp-release@sha256:90fabdb570eb248f93472cc06ef28d09d5820e80b9ed578e2484f4ef526fe6d4", Force:false}
      1. ./oc -n openshift-cluster-version logs cluster-version-operator-68ccb8c4fd-p7x4r|grep 'registry.ci.openshift.org/ocp/release@sha256\:90fabdb'|head -n1
        #

      ...
      0310 04:52:15.072040 1 cvo.go:546] Desired version from spec is v1.Update{Version:"", Image:"quay.io/openshift-release-dev/ocp-release@sha256:90fabdb570eb248f93472cc06ef28d09d5820e80b9ed578e2484f4ef526fe6d4", Force:false}
      ...
      I0310 04:52:15.225739 1 batch.go:53] No active pods for job version--v5f88 in namespace openshift-cluster-version
      I0310 04:52:15.225778 1 batch.go:22] Job version--v5f88 in namespace openshift-cluster-version is not ready, continuing to wait.
      ...
      I0310 05:03:12.238308 1 batch.go:53] No active pods for job version--v5f88 in namespace openshift-cluster-version
      E0310 05:03:12.238525 1 batch.go:19] deadline exceeded, reason: "DeadlineExceeded", message: "Job was active longer than specified deadline"
      .....

      1. ./oc get all -n openshift-cluster-version
        NAME READY STATUS RESTARTS AGE
        pod/cluster-version-operator-68ccb8c4fd-p7x4r 1/1 Running 0 61m

      NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
      service/cluster-version-operator ClusterIP 172.30.220.176 <none> 9099/TCP 62m

      NAME READY UP-TO-DATE AVAILABLE AGE
      deployment.apps/cluster-version-operator 1/1 1 1 61m

      NAME DESIRED CURRENT READY AGE
      replicaset.apps/cluster-version-operator-68ccb8c4fd 1 1 1 61m

      NAME COMPLETIONS DURATION AGE
      job.batch/version--v5f88 0/1 30m 30m

      Version-Release number of the following components:
      4.11.0-0.nightly-2022-03-04-063157

      How reproducible:
      always

      Steps to Reproduce:
      1. Trigger an upgrade to an unavailable image(by mistake), from 4.11.0-0.nightly-2022-03-04-063157 to 4.11.0-0.nightly-2022-03-08-191358

      #./oc adm upgrade --to-image quay.io/openshift-release-dev/ocp-release@sha256:90fabdb570eb248f93472cc06ef28d09d5820e80b9ed578e2484f4ef526fe6d4 --allow-explicit-upgrade
      warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
      Updating to release image quay.io/openshift-release-dev/ocp-release@sha256:90fabdb570eb248f93472cc06ef28d09d5820e80b9ed578e2484f4ef526fe6d4

      2. Wait for several mins(>5mins), no upgrade will happen(expected), and no any failure info(not expected)

      1. ./oc get clusterversion -ojson|jq .items[].status.conditions
        {
        "lastTransitionTime": "2022-03-10T04:20:12Z",
        "message": "Payload loaded version=\"4.11.0-0.nightly-2022-03-04-063157\" image=\"registry.ci.openshift.org/ocp/release@sha256:cdeb8497920d9231ecc1ea7535e056b192f2ccf0fa6257d65be3bb876c1b9de6\"",
        "reason": "PayloadLoaded",
        "status": "True",
        "type": "ReleaseAccepted"
        },
        ...
      2. ./oc get clusterversion
        NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
        version 4.11.0-0.nightly-2022-03-04-063157 True False 27m Cluster version is 4.11.0-0.nightly-2022-03-04-063157
      1. ./oc adm upgrade
        Cluster version is 4.11.0-0.nightly-2022-03-04-063157

      Upstream is unset, so the cluster will use an appropriate default.
      Channel: stable-4.11
      warning: Cannot display available updates:
      Reason: VersionNotFound
      Message: Unable to retrieve available updates: currently reconciling cluster version 4.11.0-0.nightly-2022-03-04-063157 not found in the "stable-4.11" channel

      3. Continue upgrade to target payload with correct repo

      1. ./oc adm upgrade --to-image registry.ci.openshift.org/ocp/release@sha256:90fabdb570eb248f93472cc06ef28d09d5820e80b9ed578e2484f4ef526fe6d4 --allow-explicit-upgrade
        warning: The requested upgrade image is not one of the available updates.You have used --allow-explicit-upgrade for the update to proceed anyway
        Updating to release image registry.ci.openshift.org/ocp/release@sha256:90fabdb570eb248f93472cc06ef28d09d5820e80b9ed578e2484f4ef526fe6d4

      4. Still no upgrade happen, the same with step 2(not expected)

      Actual results:
      An update to available payload will bring cvo does not work.

      Expected results:
      Upgrade to correct target payload should be triggerred.

      Additional info:
      `oc adm upgrade --clear` to cancel the initial invalid upgrade before triggering new upgrade does not work. Only delete cvo pod to get it re-deployed, then cvo will work again.

              lmohanty@redhat.com Lalatendu Mohanty
              trking W. Trevor King
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: