Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-10565

upgrade for a disconnected cluster get hang on retrieving and verifying payload

    XMLWordPrintable

Details

    Description

      This bug is a backport clone of [Bugzilla Bug 2090680](https://bugzilla.redhat.com/show_bug.cgi?id=2090680). The following is the description of the original bug:

      Description of problem:

      Version-Release number of the following components:
      4.11.0-0.nightly-2022-05-25-123329

      How reproducible:
      Always

      Steps to Reproduce:
      1. set up a cluster in a restricted network using 4.11.0-0.nightly-2022-05-25-123329
      2. mirror 4.11.0-0.nightly-2022-05-25-193227 to private registry
      3. upgrade the cluster to 4.11.0-0.nightly-2022-05-25-193227 without --force option
      $ oc adm upgrade --allow-explicit-upgrade --to-image registry.ci.openshift.org/ocp/release@sha256:83ca476a63dfafa49e35cab2ded1fbf3991cc3483875b1bf639eabda31faadfd

      Actual results:
      Wait for 3+ hours, no any upgrade history info in clusterversion, from event log, only can see "Retrieving and verifying payload".

      [root@preserve-jialiu-ansible ~]# oc get clusterversion
      NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
      version 4.11.0-0.nightly-2022-05-25-123329 True False 160m Cluster version is 4.11.0-0.nightly-2022-05-25-123329

      [root@preserve-jialiu-ansible ~]# oc get clusterversion -o yaml
      apiVersion: v1
      items:

      • apiVersion: config.openshift.io/v1
        kind: ClusterVersion
        metadata:
        creationTimestamp: "2022-05-26T03:51:28Z"
        generation: 3
        name: version
        resourceVersion: "62069"
        uid: b5674b4b-7295-4287-904c-94fe1112659b
        spec:
        channel: stable-4.11
        clusterID: 027285eb-b4ea-4127-85b6-031c1af7db72
        desiredUpdate:
        force: false
        image: registry.ci.openshift.org/ocp/release@sha256:83ca476a63dfafa49e35cab2ded1fbf3991cc3483875b1bf639eabda31faadfd
        version: ""
        status:
        availableUpdates: null
        capabilities:
        enabledCapabilities:
      • baremetal
      • marketplace
      • openshift-samples
        knownCapabilities:
      • baremetal
      • marketplace
      • openshift-samples
        conditions:
      • lastTransitionTime: "2022-05-26T03:51:31Z"
        message: Capabilities match configured spec
        reason: AsExpected
        status: "False"
        type: ImplicitlyEnabledCapabilities
      • lastTransitionTime: "2022-05-26T03:51:31Z"
        message: Payload loaded version="4.11.0-0.nightly-2022-05-25-123329" image="registry.ci.openshift.org/ocp/release@sha256:13bfc31eb4a284ce691e848c25d9120dbde3f0852d4be64be4b90953ac914bf1"
        reason: PayloadLoaded
        status: "True"
        type: ReleaseAccepted
      • lastTransitionTime: "2022-05-26T04:23:06Z"
        message: Done applying 4.11.0-0.nightly-2022-05-25-123329
        status: "True"
        type: Available
      • lastTransitionTime: "2022-05-26T04:21:21Z"
        status: "False"
        type: Failing
      • lastTransitionTime: "2022-05-26T04:23:06Z"
        message: Cluster version is 4.11.0-0.nightly-2022-05-25-123329
        status: "False"
        type: Progressing
      • lastTransitionTime: "2022-05-26T03:51:31Z"
        message: 'Unable to retrieve available updates: Get "https://api.openshift.com/api/upgrades_info/v1/graph?arch=amd64&channel=stable-4.11&id=027285eb-b4ea-4127-85b6-031c1af7db72&version=4.11.0-0.nightly-2022-05-25-123329":
        dial tcp 34.228.45.157:443: connect: connection timed out'
        reason: RemoteFailed
        status: "False"
        type: RetrievedUpdates
        desired:
        image: registry.ci.openshift.org/ocp/release@sha256:13bfc31eb4a284ce691e848c25d9120dbde3f0852d4be64be4b90953ac914bf1
        version: 4.11.0-0.nightly-2022-05-25-123329
        history:
      • completionTime: "2022-05-26T04:23:06Z"
        image: registry.ci.openshift.org/ocp/release@sha256:13bfc31eb4a284ce691e848c25d9120dbde3f0852d4be64be4b90953ac914bf1
        startedTime: "2022-05-26T03:51:31Z"
        state: Completed
        verified: false
        version: 4.11.0-0.nightly-2022-05-25-123329
        observedGeneration: 2
        versionHash: jOIXVtM5Y-g=
        kind: List
        metadata:
        resourceVersion: ""

      [root@preserve-jialiu-ansible ~]# oc get event -n openshift-cluster-version
      LAST SEEN TYPE REASON OBJECT MESSAGE
      3h11m Warning FailedScheduling pod/cluster-version-operator-b4b6c5f9b-p7fjq no nodes available to schedule pods
      3h9m Warning FailedScheduling pod/cluster-version-operator-b4b6c5f9b-p7fjq no nodes available to schedule pods
      3h4m Normal Scheduled pod/cluster-version-operator-b4b6c5f9b-p7fjq Successfully assigned openshift-cluster-version/cluster-version-operator-b4b6c5f9b-p7fjq to jialiu411a-5nb8n-master-2 by jialiu411a-5nb8n-bootstrap
      3h2m Warning FailedMount pod/cluster-version-operator-b4b6c5f9b-p7fjq MountVolume.SetUp failed for volume "serving-cert" : secret "cluster-version-operator-serving-cert" not found
      3h1m Warning FailedMount pod/cluster-version-operator-b4b6c5f9b-p7fjq Unable to attach or mount volumes: unmounted volumes=[serving-cert], unattached volumes=[etc-ssl-certs etc-cvo-updatepayloads serving-cert service-ca kube-api-access]: timed out waiting for the condition
      3h1m Normal Pulling pod/cluster-version-operator-b4b6c5f9b-p7fjq Pulling image "registry.ci.openshift.org/ocp/release@sha256:13bfc31eb4a284ce691e848c25d9120dbde3f0852d4be64be4b90953ac914bf1"
      3h1m Normal Pulled pod/cluster-version-operator-b4b6c5f9b-p7fjq Successfully pulled image "registry.ci.openshift.org/ocp/release@sha256:13bfc31eb4a284ce691e848c25d9120dbde3f0852d4be64be4b90953ac914bf1" in 1.384468759s
      3h1m Normal Created pod/cluster-version-operator-b4b6c5f9b-p7fjq Created container cluster-version-operator
      3h1m Normal Started pod/cluster-version-operator-b4b6c5f9b-p7fjq Started container cluster-version-operator
      3h11m Normal SuccessfulCreate replicaset/cluster-version-operator-b4b6c5f9b Created pod: cluster-version-operator-b4b6c5f9b-p7fjq
      3h11m Normal ScalingReplicaSet deployment/cluster-version-operator Scaled up replica set cluster-version-operator-b4b6c5f9b to 1
      3h12m Normal LeaderElection configmap/version jialiu411a-5nb8n-bootstrap_0a3ff57f-66cf-4f93-bbe0-484effcc4383 became leader
      3h12m Normal RetrievePayload clusterversion/version Retrieving and verifying payload version="4.11.0-0.nightly-2022-05-25-123329" image="registry.ci.openshift.org/ocp/release@sha256:13bfc31eb4a284ce691e848c25d9120dbde3f0852d4be64be4b90953ac914bf1"
      3h12m Normal LoadPayload clusterversion/version Loading payload version="4.11.0-0.nightly-2022-05-25-123329" image="registry.ci.openshift.org/ocp/release@sha256:13bfc31eb4a284ce691e848c25d9120dbde3f0852d4be64be4b90953ac914bf1"
      3h12m Normal PayloadLoaded clusterversion/version Payload loaded version="4.11.0-0.nightly-2022-05-25-123329" image="registry.ci.openshift.org/ocp/release@sha256:13bfc31eb4a284ce691e848c25d9120dbde3f0852d4be64be4b90953ac914bf1"
      166m Normal LeaderElection configmap/version jialiu411a-5nb8n-master-2_83752e0b-1ef4-4c69-814f-8eeb54d50781 became leader
      166m Normal RetrievePayload clusterversion/version Retrieving and verifying payload version="4.11.0-0.nightly-2022-05-25-123329" image="registry.ci.openshift.org/ocp/release@sha256:13bfc31eb4a284ce691e848c25d9120dbde3f0852d4be64be4b90953ac914bf1"
      166m Normal LoadPayload clusterversion/version Loading payload version="4.11.0-0.nightly-2022-05-25-123329" image="registry.ci.openshift.org/ocp/release@sha256:13bfc31eb4a284ce691e848c25d9120dbde3f0852d4be64be4b90953ac914bf1"
      166m Normal PayloadLoaded clusterversion/version Payload loaded version="4.11.0-0.nightly-2022-05-25-123329" image="registry.ci.openshift.org/ocp/release@sha256:13bfc31eb4a284ce691e848c25d9120dbde3f0852d4be64be4b90953ac914bf1"
      77m Normal RetrievePayload clusterversion/version Retrieving and verifying payload version="" image="registry.ci.openshift.org/ocp/release@sha256:83ca476a63dfafa49e35cab2ded1fbf3991cc3483875b1bf639eabda31faadfd"

      Expected results:
      CVO and `oc adm upgrade` should clearly prompt user what issues happened there, but not pending there for a long time without any info.

      Additional info:
      Try the same upgrade path against a connected cluster, upgrade is kicked off soon, no such issues.

      Attachments

        Issue Links

          Activity

            People

              afri@afri.cz Petr Muller
              openshift-crt-jira-prow OpenShift Prow Bot
              Evgeni Vakhonin Evgeni Vakhonin
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: