Uploaded image for project: 'OpenShift Over the Air'
  1. OpenShift Over the Air
  2. OTA-960

CVO should report progressing=true for the length of --to-multi-arch transition

XMLWordPrintable

    • OTA 262
    • Approved

      Description of problem:

      applying oc adm upgrade --to-multi-arch causes cvo condition Progressing=True for less than 2 mintes, after which .status.history mentions the transition as "Completed", while the cluster itself still progressing to heterogenous payload for more than 30 minutes in background, mostly unnoticed.
      during this time, mostly "oc adm upgrade" shows nothing noticeable, only while monitored continuously, there's intermediate messages of Upgradeable=False or Failing=True, but no further Progressing=True condition.
      
      during this time, cluster operators are progressing in the background, and some master nodes NotReady,SchedulingDisabled

      Version-Release number of selected component (if applicable):

      4.13.0-ec.2

      How reproducible:

      100%

      Steps to Reproduce:

      1. apply aforementioned command
      2. monitor oc adm upgrade, cvo status, cluster operators and nodes
      
      

      Actual results:

      ❯ oc adm upgrade --to-multi-arch 
      Requested update to multi cluster architecture
      
      Thu 02 Mar 2023 18:10:41 IST
      oc adm upgrade
      info: An upgrade is in progress. Working towards 4.13.0-ec.2: 9 of 831 done (1% complete)
      
      History:
      2023-03-02T16:10:41Z  Partial 4.13.0-ec.2 quay.io/openshift-release-dev/ocp-release@sha256:bdc145f7f6347433f8461a1133d6354abf52268925ce7459a4294d44b9beb4ef
      2023-03-02T09:31:36Z 2023-03-02T09:32:13Z Completed 4.13.0-ec.2 quay.io/openshift-release-dev/ocp-release@sha256:01192353b3c3e536779cfa0fc910064299df15ce01be0cff7188868588d32321
      
      Conditions:
      2023-03-02T09:29:45Z RetrievedUpdates=True : 
      2023-03-02T08:38:21Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec
      2023-03-02T08:38:21Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.13.0-ec.2" image="quay.io/openshift-release-dev/ocp-release@sha256:bdc145f7f6347433f8461a1133d6354abf52268925ce7459a4294d44b9beb4ef" architecture="Multi"
      2023-03-02T08:59:13Z Available=True : Done applying 4.13.0-ec.2
      2023-03-02T08:59:13Z Failing=False : 
      2023-03-02T16:10:41Z Progressing=True : Working towards 4.13.0-ec.2: 9 of 831 done (1% complete)
      
      
      
      
      Thu 02 Mar 2023 18:11:46 IST
      oc adm upgrade
      Cluster version is 4.13.0-ec.2 
      
      History:
      2023-03-02T16:10:41Z 2023-03-02T16:11:46Z Completed 4.13.0-ec.2 quay.io/openshift-release-dev/ocp-release@sha256:bdc145f7f6347433f8461a1133d6354abf52268925ce7459a4294d44b9beb4ef
      2023-03-02T09:31:36Z 2023-03-02T09:32:13Z Completed 4.13.0-ec.2 quay.io/openshift-release-dev/ocp-release@sha256:01192353b3c3e536779cfa0fc910064299df15ce01be0cff7188868588d32321
      
      Conditions:
      2023-03-02T09:29:45Z RetrievedUpdates=True : 
      2023-03-02T08:38:21Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec
      2023-03-02T08:38:21Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.13.0-ec.2" image="quay.io/openshift-release-dev/ocp-release@sha256:bdc145f7f6347433f8461a1133d6354abf52268925ce7459a4294d44b9beb4ef" architecture="Multi"
      2023-03-02T08:59:13Z Available=True : Done applying 4.13.0-ec.2
      2023-03-02T08:59:13Z Failing=False : 
      2023-03-02T16:11:46Z Progressing=False : Cluster version is 4.13.0-ec.2
      
      
      
      
      
      
      
      #progressing false at this point already. no more Progressing=True for the rest of transition!!
      
      
      
      
      
      
      
      oc get co
      NAME                                       VERSION       AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      authentication                             4.13.0-ec.2   True        True          False      7h14m   APIServerDeploymentProgressing: deployment/apiserver.openshift-oauth-apiserver: 2/3 pods have been updated to the latest generation...
      baremetal                                  4.13.0-ec.2   True        False         False      7h32m   
      cloud-controller-manager                   4.13.0-ec.2   True        False         False      7h33m   
      cloud-credential                           4.13.0-ec.2   True        False         False      7h32m   
      cluster-autoscaler                         4.13.0-ec.2   True        False         False      7h31m   
      config-operator                            4.13.0-ec.2   True        False         False      7h32m   
      console                                    4.13.0-ec.2   True        False         False      7h20m   
      control-plane-machine-set                  4.13.0-ec.2   True        False         False      7h30m   
      csi-snapshot-controller                    4.13.0-ec.2   True        False         False      7h32m   
      dns                                        4.13.0-ec.2   True        True          False      7h31m   DNS "default" reports Progressing=True: "Have 5 available DNS pods, want 6.\nHave 2 up-to-date DNS pods, want 6."...
      etcd                                       4.13.0-ec.2   True        True          False      7h30m   NodeInstallerProgressing: 3 nodes are at revision 8; 0 nodes have achieved new revision 9
      image-registry                             4.13.0-ec.2   True        True          False      7h26m   Progressing: The deployment has not completed...
      ingress                                    4.13.0-ec.2   True        False         False      7h26m   
      insights                                   4.13.0-ec.2   True        False         False      7h26m   
      kube-apiserver                             4.13.0-ec.2   True        True          False      7h28m   NodeInstallerProgressing: 3 nodes are at revision 6; 0 nodes have achieved new revision 7
      kube-controller-manager                    4.13.0-ec.2   True        True          False      7h29m   NodeInstallerProgressing: 3 nodes are at revision 6; 0 nodes have achieved new revision 8
      kube-scheduler                             4.13.0-ec.2   True        True          False      7h29m   NodeInstallerProgressing: 3 nodes are at revision 6; 0 nodes have achieved new revision 7
      kube-storage-version-migrator              4.13.0-ec.2   True        False         False      7h32m   
      machine-api                                4.13.0-ec.2   True        False         False      7h28m   
      machine-approver                           4.13.0-ec.2   True        False         False      7h32m   
      machine-config                             4.13.0-ec.2   True        False         False      7h30m   
      marketplace                                4.13.0-ec.2   True        False         False      7h31m   
      monitoring                                 4.13.0-ec.2   True        False         False      7h25m   
      network                                    4.13.0-ec.2   True        True          False      7h33m   DaemonSet "/openshift-sdn/sdn" update is rolling out (4 out of 6 updated)...
      node-tuning                                4.13.0-ec.2   True        False         False      7h31m   
      openshift-apiserver                        4.13.0-ec.2   True        True          False      7h26m   APIServerDeploymentProgressing: deployment/apiserver.openshift-apiserver: 1/3 pods have been updated to the latest generation
      openshift-controller-manager               4.13.0-ec.2   True        False         False      7h28m   
      openshift-samples                          4.13.0-ec.2   True        False         False      7h26m   
      operator-lifecycle-manager                 4.13.0-ec.2   True        False         False      7h32m   
      operator-lifecycle-manager-catalog         4.13.0-ec.2   True        False         False      7h32m   
      operator-lifecycle-manager-packageserver   4.13.0-ec.2   True        False         False      7h26m   
      service-ca                                 4.13.0-ec.2   True        False         False      7h32m   
      storage                                    4.13.0-ec.2   True        False         False      7h32m   
      
      
      
      Thu 02 Mar 2023 18:17:51 IST
      Cluster version is 4.13.0-ec.2
      
      Upgradeable=False  
        Reason: PoolUpdating
        Message: Cluster operator machine-config should not be upgraded between minor versions: One or more machine config pools are updating, please see `oc get mcp` for further details
      
      Conditions:
      2023-03-02T09:29:45Z RetrievedUpdates=True : 
      2023-03-02T08:38:21Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec
      2023-03-02T08:38:21Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.13.0-ec.2" image="quay.io/openshift-release-dev/ocp-release@sha256:bdc145f7f6347433f8461a1133d6354abf52268925ce7459a4294d44b9beb4ef" architecture="Multi"
      2023-03-02T08:59:13Z Available=True : Done applying 4.13.0-ec.2
      2023-03-02T08:59:13Z Failing=False : 
      2023-03-02T16:11:46Z Progressing=False : Cluster version is 4.13.0-ec.2
      2023-03-02T16:17:50Z Upgradeable=False PoolUpdating: Cluster operator machine-config should not be upgraded between minor versions: One or more machine config pools are updating, please see `oc get mcp` for further details
      
      Thu 02 Mar 2023 18:23:22 IST
      2023-03-02T09:29:45Z RetrievedUpdates=True : 
      2023-03-02T08:38:21Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec
      2023-03-02T08:38:21Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.13.0-ec.2" image="quay.io/openshift-release-dev/ocp-release@sha256:bdc145f7f6347433f8461a1133d6354abf52268925ce7459a4294d44b9beb4ef" architecture="Multi"
      2023-03-02T08:59:13Z Available=True : Done applying 4.13.0-ec.2
      2023-03-02T08:59:13Z Failing=False : 
      2023-03-02T16:11:46Z Progressing=False : Cluster version is 4.13.0-ec.2
      2023-03-02T16:17:50Z Upgradeable=False PoolUpdating: Cluster operator machine-config should not be upgraded between minor versions: One or more machine config pools are updating, please see `oc get mcp` for further details 
      
      Thu 02 Mar 2023 18:27:37 IST
      Conditions:
      2023-03-02T09:29:45Z RetrievedUpdates=True : 
      2023-03-02T08:38:21Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec
      2023-03-02T08:38:21Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.13.0-ec.2" image="quay.io/openshift-release-dev/ocp-release@sha256:bdc145f7f6347433f8461a1133d6354abf52268925ce7459a4294d44b9beb4ef" architecture="Multi"
      2023-03-02T08:59:13Z Available=True : Done applying 4.13.0-ec.2
      2023-03-02T16:27:36Z Failing=True ClusterOperatorDegraded: Cluster operator machine-config is degraded
      2023-03-02T16:11:46Z Progressing=False ClusterOperatorDegraded: Error while reconciling 4.13.0-ec.2: the cluster operator machine-config is degraded
      2023-03-02T16:17:50Z Upgradeable=False PoolUpdating: Cluster operator machine-config should not be upgraded between minor versions: One or more machine config pools are updating, please see `oc get mcp` for further details
      
      Thu 02 Mar 2023 18:32:37 IST
      oc adm upgrade
      Cluster version is 4.13.0-ec.2
      
      Conditions:
      2023-03-02T09:29:45Z RetrievedUpdates=True : 
      2023-03-02T08:38:21Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec
      2023-03-02T08:38:21Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.13.0-ec.2" image="quay.io/openshift-release-dev/ocp-release@sha256:bdc145f7f6347433f8461a1133d6354abf52268925ce7459a4294d44b9beb4ef" architecture="Multi"
      2023-03-02T08:59:13Z Available=True : Done applying 4.13.0-ec.2
      2023-03-02T16:32:36Z Failing=False : 
      2023-03-02T16:11:46Z Progressing=False : Cluster version is 4.13.0-ec.2
      
      oc get co
      kube-apiserver                             4.13.0-ec.2   True        True          False      7h48m   NodeInstallerProgressing: 1 nodes are at revision 6; 2 nodes are at revision 7
      
      Thu 02 Mar 2023 18:35:52 IST
      Failing=True:  Reason: ClusterOperatorDegraded
        Message: Cluster operator kube-apiserver is degraded
      
      2023-03-02T16:11:46Z Progressing=False ClusterOperatorDegraded: Error while reconciling 4.13.0-ec.2: the cluster operator kube-apiserver is degraded
      
      Thu 02 Mar 2023 18:39:22 IST
      Cluster version is 4.13.0-ec.2
      
      2023-03-02T09:29:45Z RetrievedUpdates=True : 
      2023-03-02T08:38:21Z ImplicitlyEnabledCapabilities=False AsExpected: Capabilities match configured spec
      2023-03-02T08:38:21Z ReleaseAccepted=True PayloadLoaded: Payload loaded version="4.13.0-ec.2" image="quay.io/openshift-release-dev/ocp-release@sha256:bdc145f7f6347433f8461a1133d6354abf52268925ce7459a4294d44b9beb4ef" architecture="Multi"
      2023-03-02T08:59:13Z Available=True : Done applying 4.13.0-ec.2
      2023-03-02T16:39:21Z Failing=False : 
      2023-03-02T16:11:46Z Progressing=False : Cluster version is 4.13.0-ec.2
      
      
      
      

      Expected results:

      Progressing=True while cluster is still progressing

      Additional info:

      part of the time, "oc adm upgrade" shows as if nothing is happening, 
      an upgrade applied during this time, with --to-latest for example, will take unusually long, around 2 hours to complete, with intermediate timed out messages in operators.

       

      Definition of done:

      *  CVO should report progressing=true for the whole length of --to-multi-arch transition.

              Unassigned Unassigned
              evakhoni@redhat.com Evgeni Vakhonin
              Jian Li Jian Li
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

                Created:
                Updated:
                Resolved: