Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-43964

cvo trying to progress unaccepted release following scale toggle

XMLWordPrintable

    • Moderate
    • No
    • OTA 262
    • 1
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, restarting a CVO pod while initializing the sync work broke the guard of a blocked upgrade request. As a result, the blocked request was incorrectly accepted. with this release, the CVO postpones the reconciliation during the initialization step, and the issue is resolved. (link:https://issues.redhat.com/browse/OCPBUGS-43964[*OCPBUGS-43964*])
      -------
      Cause: The restart of the CVO pod while it is in the process of initializing the sync work breaks the guard of the blocked upgrade request.

      Consequence: The blocked request is unexpectedly accepted.

      Fix: CVO postpones the reconciliation during initialization.

      Result: The guard of the blocked upgrade request survived of the CVO restarts.
      Show
      * Previously, restarting a CVO pod while initializing the sync work broke the guard of a blocked upgrade request. As a result, the blocked request was incorrectly accepted. with this release, the CVO postpones the reconciliation during the initialization step, and the issue is resolved. (link: https://issues.redhat.com/browse/OCPBUGS-43964 [* OCPBUGS-43964 *]) ------- Cause: The restart of the CVO pod while it is in the process of initializing the sync work breaks the guard of the blocked upgrade request. Consequence: The blocked request is unexpectedly accepted. Fix: CVO postpones the reconciliation during initialization. Result: The guard of the blocked upgrade request survived of the CVO restarts.
    • Bug Fix
    • In Progress

      This is a clone of issue OCPBUGS-42386. The following is the description of the original issue:

      Description of problem:

      usually providing a cluster with unaccepted update, such as unsigned payload without force, is treated with releaseaccepted=false progressing=false. however by scaling cvo deployment down and up again, progressing=true is observed, causing oc adm upgrade as well as oc adm upgrade status to display incorrect information, and clusterversion object to display empty capabilities and history item with version ""

      Version-Release number of selected component (if applicable):

      4.16.0-rc.4 but observed as well as early as 4.10.67

      How reproducible:

      100%

      Steps to Reproduce:

      1. target the cluster at unsigned build without using force
      ❯ oc adm upgrade --allow-explicit-upgrade --to-image registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
      
      2. scale cvo down and up again
       ❯ oc scale --replicas 0 -n openshift-cluster-version deployments/cluster-version-operator
      deployment.apps/cluster-version-operator scaled
      
      ❯ oc scale --replicas 1 -n openshift-cluster-version deployments/cluster-version-operator
      deployment.apps/cluster-version-operator scaled
       
      

      Actual results:

      oc adm update displays "info: An upgrade is in progress. Working towards..."

      also a warning of "Architecture has not been configured"

      ❯ oc adm upgrade
      info: An upgrade is in progress. Working towards registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
      
      ReleaseAccepted=False  
      
        Reason: RetrievePayload
        Message: Retrieving payload failed version="" image="registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a" failure=The update cannot be verified: unable to verify sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a against keyrings: verifier-public-key-redhat
      
      Upstream is unset, so the cluster will use an appropriate default.
      Channel: stable-4.16
      warning: Cannot display available updates:
        Reason: NoArchitecture
        Message: Architecture has not been configured.
      
      

      clusterversion object have Progressing True, "capabilities: {}" as well as a partial history item with version ""

       ❯ oc get clusterversion version -oyaml                                                                                                                       
      apiVersion: config.openshift.io/v1
      kind: ClusterVersion
      metadata:
        creationTimestamp: "2024-06-10T11:36:51Z"
        generation: 3
        name: version
        resourceVersion: "70199"
        uid: 9c80848b-9f3a-4f0d-8472-a2ccce1c4023
      spec:
        channel: stable-4.16
        clusterID: e74054ac-e0fe-4cf7-a457-4887ba96cff9
        desiredUpdate:
          architecture: ""
          force: false
          image: registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
          version: ""
      status:
        availableUpdates: null
        capabilities: {}
        conditions:
        - lastTransitionTime: "2024-06-10T11:37:17Z"
          message: Architecture has not been configured.
          reason: NoArchitecture
          status: "False"
          type: RetrievedUpdates
        - lastTransitionTime: "2024-06-10T11:37:17Z"
          message: Capabilities match configured spec
          reason: AsExpected
          status: "False"
          type: ImplicitlyEnabledCapabilities
        - lastTransitionTime: "2024-06-10T14:06:42Z"
          message: 'Retrieving payload failed version="" image="registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a"
            failure=The update cannot be verified: unable to verify sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
            against keyrings: verifier-public-key-redhat'
          reason: RetrievePayload
          status: "False"
          type: ReleaseAccepted
        - lastTransitionTime: "2024-06-10T12:06:31Z"
          message: Done applying 4.16.0-rc.4
          status: "True"
          type: Available
        - lastTransitionTime: "2024-06-10T12:06:31Z"
          status: "False"
          type: Failing
        - lastTransitionTime: "2024-06-10T14:07:30Z"
          message: Working towards registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
          status: "True"
          type: Progressing
        desired:
          image: registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
          version: ""
        history:
        - completionTime: null
          image: registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
          startedTime: "2024-06-10T14:07:30Z"
          state: Partial
          verified: false
          version: ""
        - completionTime: "2024-06-10T12:06:31Z"
          image: quay.io/openshift-release-dev/ocp-release@sha256:6c236c400d3bad9b2b54d8a3b247c508f6f13511d37666de1eecca8e43bce0f6
          startedTime: "2024-06-10T11:37:17Z"
          state: Completed
          verified: false
          version: 4.16.0-rc.4
        observedGeneration: 3
        versionHash: AjnKTa_3kbg=

      in upgrade status, Progressing to an empty target with Completion 0%

      = Control Plane =
      Assessment:      Progressing
      Target Version:   (from 4.16.0-rc.4)
      Completion:      0%
      Duration:        2m26.971091165s
      Operator Status: 33 Healthy
      

       

      Expected results:

      clusterversion stays the same as before scale toggle

      apiVersion: config.openshift.io/v1
      kind: ClusterVersion
      metadata:
        creationTimestamp: "2024-06-10T11:36:51Z"
        generation: 3
        name: version
        resourceVersion: "69881"
        uid: 9c80848b-9f3a-4f0d-8472-a2ccce1c4023
      spec:
        channel: stable-4.16
        clusterID: e74054ac-e0fe-4cf7-a457-4887ba96cff9
        desiredUpdate:
          architecture: ""
          force: false
          image: registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
          version: ""
      status:
        availableUpdates: null
        capabilities:
          enabledCapabilities:
          - Build
          - CSISnapshot
          - CloudControllerManager
          - CloudCredential
          - Console
          - DeploymentConfig
          - ImageRegistry
          - Ingress
          - Insights
          - MachineAPI
          - NodeTuning
          - OperatorLifecycleManager
          - Storage
          - baremetal
          - marketplace
          - openshift-samples
          knownCapabilities:
          - Build
          - CSISnapshot
          - CloudControllerManager
          - CloudCredential
          - Console
          - DeploymentConfig
          - ImageRegistry
          - Ingress
          - Insights
          - MachineAPI
          - NodeTuning
          - OperatorLifecycleManager
          - Storage
          - baremetal
          - marketplace
          - openshift-samples
        conditions:
        - lastTransitionTime: "2024-06-10T11:37:17Z"
          message: 'Unable to retrieve available updates: currently reconciling cluster
            version 4.16.0-rc.4 not found in the "stable-4.16" channel'
          reason: VersionNotFound
          status: "False"
          type: RetrievedUpdates
        - lastTransitionTime: "2024-06-10T11:37:17Z"
          message: Capabilities match configured spec
          reason: AsExpected
          status: "False"
          type: ImplicitlyEnabledCapabilities
        - lastTransitionTime: "2024-06-10T14:06:42Z"
          message: 'Retrieving payload failed version="" image="registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a"
            failure=The update cannot be verified: unable to verify sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
            against keyrings: verifier-public-key-redhat'
          reason: RetrievePayload
          status: "False"
          type: ReleaseAccepted
        - lastTransitionTime: "2024-06-10T12:06:31Z"
          message: Done applying 4.16.0-rc.4
          status: "True"
          type: Available
        - lastTransitionTime: "2024-06-10T12:06:31Z"
          status: "False"
          type: Failing
        - lastTransitionTime: "2024-06-10T12:06:31Z"
          message: Cluster version is 4.16.0-rc.4
          status: "False"
          type: Progressing
        desired:
          image: quay.io/openshift-release-dev/ocp-release@sha256:6c236c400d3bad9b2b54d8a3b247c508f6f13511d37666de1eecca8e43bce0f6
          url: https://access.redhat.com/errata/RHEA-2024:0041
          version: 4.16.0-rc.4
        history:
        - completionTime: "2024-06-10T12:06:31Z"
          image: quay.io/openshift-release-dev/ocp-release@sha256:6c236c400d3bad9b2b54d8a3b247c508f6f13511d37666de1eecca8e43bce0f6
          startedTime: "2024-06-10T11:37:17Z"
          state: Completed
          verified: false
          version: 4.16.0-rc.4
        observedGeneration: 2
        versionHash: AjnKTa_3kbg=
      

      no upgrade is in progress message for release that is not accepted 

       ❯ oc adm upgrade
      Cluster version is 4.16.0-rc.4
      
      ReleaseAccepted=False
      
        Reason: RetrievePayload
        Message: Retrieving payload failed version="" image="registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a" failure=The update cannot be verified: unable to verify sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a against keyrings: verifier-public-key-redhat
      
      Upstream is unset, so the cluster will use an appropriate default.
      Channel: stable-4.16
      warning: Cannot display available updates:
        Reason: VersionNotFound
        Message: Unable to retrieve available updates: currently reconciling cluster version 4.16.0-rc.4 not found in the "stable-4.16" channel

       

      Additional info:

      it is possible to kick the cluster out of this state, by applying --clear, which causing the cluster to breefly progress into its original version, followed by 3 items appearing in history

      ❯ oc adm upgrade --clear
      Cleared the update field, still at registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
      
      ❯ oc adm upgrade
      info: An upgrade is in progress. Working towards 4.16.0-rc.4: 116 of 894 done (12% complete)
      
      Upstream is unset, so the cluster will use an appropriate default.
      Channel: stable-4.16
      warning: Cannot display available updates:
        Reason: VersionNotFound
        Message: Unable to retrieve available updates: currently reconciling cluster version 4.16.0-rc.4 not found in the "stable-4.16" channel

       

      ❯ oc get clusterversion version -oyaml
      apiVersion: config.openshift.io/v1
      kind: ClusterVersion
      metadata:
        creationTimestamp: "2024-06-10T11:36:51Z"
        generation: 4
        name: version
        resourceVersion: "72594"
        uid: 9c80848b-9f3a-4f0d-8472-a2ccce1c4023
      spec:
        channel: stable-4.16
        clusterID: e74054ac-e0fe-4cf7-a457-4887ba96cff9
      status:
        availableUpdates: null
        capabilities:
          enabledCapabilities:
          - Build
          - CSISnapshot
          - CloudControllerManager
          - CloudCredential
          - Console
          - DeploymentConfig
          - ImageRegistry
          - Ingress
          - Insights
          - MachineAPI
          - NodeTuning
          - OperatorLifecycleManager
          - Storage
          - baremetal
          - marketplace
          - openshift-samples
          knownCapabilities:
          - Build
          - CSISnapshot
          - CloudControllerManager
          - CloudCredential
          - Console
          - DeploymentConfig
          - ImageRegistry
          - Ingress
          - Insights
          - MachineAPI
          - NodeTuning
          - OperatorLifecycleManager
          - Storage
          - baremetal
          - marketplace
          - openshift-samples
        conditions:
        - lastTransitionTime: "2024-06-10T11:37:17Z"
          message: 'Unable to retrieve available updates: currently reconciling cluster
            version 4.16.0-rc.4 not found in the "stable-4.16" channel'
          reason: VersionNotFound
          status: "False"
          type: RetrievedUpdates
        - lastTransitionTime: "2024-06-10T11:37:17Z"
          message: Capabilities match configured spec
          reason: AsExpected
          status: "False"
          type: ImplicitlyEnabledCapabilities
        - lastTransitionTime: "2024-06-10T14:13:07Z"
          message: Payload loaded version="4.16.0-rc.4" image="quay.io/openshift-release-dev/ocp-release@sha256:6c236c400d3bad9b2b54d8a3b247c508f6f13511d37666de1eecca8e43bce0f6"
            architecture="amd64"
          reason: PayloadLoaded
          status: "True"
          type: ReleaseAccepted
        - lastTransitionTime: "2024-06-10T12:06:31Z"
          message: Done applying 4.16.0-rc.4
          status: "True"
          type: Available
        - lastTransitionTime: "2024-06-10T12:06:31Z"
          status: "False"
          type: Failing
        - lastTransitionTime: "2024-06-10T14:14:00Z"
          message: Cluster version is 4.16.0-rc.4
          status: "False"
          type: Progressing
        desired:
          image: quay.io/openshift-release-dev/ocp-release@sha256:6c236c400d3bad9b2b54d8a3b247c508f6f13511d37666de1eecca8e43bce0f6
          url: https://access.redhat.com/errata/RHEA-2024:0041
          version: 4.16.0-rc.4
        history:
        - completionTime: "2024-06-10T14:14:00Z"
          image: quay.io/openshift-release-dev/ocp-release@sha256:6c236c400d3bad9b2b54d8a3b247c508f6f13511d37666de1eecca8e43bce0f6
          startedTime: "2024-06-10T14:13:07Z"
          state: Completed
          verified: false
          version: 4.16.0-rc.4
        - completionTime: "2024-06-10T14:13:07Z"
          image: registry.ci.openshift.org/ocp/release@sha256:36cfa8cebb86ded6e1d51c308d31eb7b2c2e7705a0df6f698c690b6fba8b7e7a
          startedTime: "2024-06-10T14:07:30Z"
          state: Partial
          verified: false
          version: ""
        - completionTime: "2024-06-10T12:06:31Z"
          image: quay.io/openshift-release-dev/ocp-release@sha256:6c236c400d3bad9b2b54d8a3b247c508f6f13511d37666de1eecca8e43bce0f6
          startedTime: "2024-06-10T11:37:17Z"
          state: Completed
          verified: false
          version: 4.16.0-rc.4
        observedGeneration: 4
        versionHash: AjnKTa_3kbg=
       

      also trying to apply a rollback at this state, resulting in invalid SemVer error

       ❯ OC_ENABLE_CMD_UPGRADE_ROLLBACK=true oc adm upgrade rollback                                                             
      error: previous version "" invalid SemVer: Version string empty
      

              hongkliu Hongkai Liu
              openshift-crt-jira-prow OpenShift Prow Bot
              Evgeni Vakhonin Evgeni Vakhonin
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: