Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-15871

Could not update clusterrolebinding "csi-snapshot-controller-runner-operator" blocks 4.12.23 arm64 upgrade to 4.13.4 arm64

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Critical Critical
    • None
    • 4.13.z
    • None
    • No
    • Proposed
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      "periodic-ci-openshift-openshift-tests-private-release-4.13-arm64-stable-4.13-upgrade-from-stable-4.12-azure-ipi-disconnected-fullyprivate-p2-f28" prow job link

      https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.13-arm64-stable-4.13-upgrade-from-stable-4.12-azure-ipi-disconnected-fullyprivate-p2-f28/1676831922958897152

      wanted to upgrade from "quay.io/openshift-release-dev/ocp-release:4.12.23-aarch64 " to "quay.io/openshift-release-dev/ocp-release:4.13.4-aarch64", but the update could not be applied

      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.12.23   True        False         122m    Error while reconciling 4.12.23: the update could not be applied

      info in the clusterversion, "Could not update clusterrolebinding "csi-snapshot-controller-runner-operator"

      oc get clusterversion/version -oyaml
      apiVersion: config.openshift.io/v1
      kind: ClusterVersion
      metadata:
        creationTimestamp: "2023-07-06T06:50:01Z"
        generation: 3
        name: version
        resourceVersion: "87024"
        uid: c1503277-4cfb-4ce7-8162-03e769550650
      spec:
        clusterID: ee43f699-7833-40b9-8a54-70388fb2b395
        desiredUpdate:
          force: false
          image: registry.build02.ci.openshift.org/ci-op-rsw9220d/release@sha256:13b14f0514d24d241d40ebacac9f15f93acebc4a7849e4740df49e65e48af424
          version: ""
      status:
        availableUpdates: null
        capabilities:
          enabledCapabilities:
          - CSISnapshot
          - Console
          - Insights
          - Storage
          - baremetal
          - marketplace
          - openshift-samples
          knownCapabilities:
          - CSISnapshot
          - Console
          - Insights
          - Storage
          - baremetal
          - marketplace
          - openshift-samples
        conditions:
        - lastTransitionTime: "2023-07-06T06:50:03Z"
          message: The update channel has not been configured.
          reason: NoChannel
          status: "False"
          type: RetrievedUpdates
        - lastTransitionTime: "2023-07-06T06:50:03Z"
          message: Capabilities match configured spec
          reason: AsExpected
          status: "False"
          type: ImplicitlyEnabledCapabilities
        - lastTransitionTime: "2023-07-06T07:38:58Z"
          message: 'Retrieving payload failed version="" image="registry.build02.ci.openshift.org/ci-op-rsw9220d/release@sha256:13b14f0514d24d241d40ebacac9f15f93acebc4a7849e4740df49e65e48af424"
            failure=The update cannot be verified: unable to verify sha256:13b14f0514d24d241d40ebacac9f15f93acebc4a7849e4740df49e65e48af424
            against keyrings: verifier-public-key-redhat'
          reason: RetrievePayload
          status: "False"
          type: ReleaseAccepted
        - lastTransitionTime: "2023-07-06T07:24:30Z"
          message: Done applying 4.12.23
          status: "True"
          type: Available
        - lastTransitionTime: "2023-07-06T09:36:58Z"
          message: Could not update clusterrolebinding "csi-snapshot-controller-runner-operator"
            (322 of 831)
          reason: UpdatePayloadFailed
          status: "True"
          type: Failing
        - lastTransitionTime: "2023-07-06T07:24:30Z"
          message: 'Error while reconciling 4.12.23: the update could not be applied'
          reason: UpdatePayloadFailed
          status: "False"
          type: Progressing
        desired:
          image: registry.build02.ci.openshift.org/ci-op-rsw9220d/release@sha256:d9998b576ab98dcfd9024927572f698c0f40bf7fe4f8eff287f41ab4fa5e9c93
          url: https://access.redhat.com/errata/RHSA-2023:3925
          version: 4.12.23
        history:
        - completionTime: "2023-07-06T07:24:30Z"
          image: registry.build02.ci.openshift.org/ci-op-rsw9220d/release@sha256:d9998b576ab98dcfd9024927572f698c0f40bf7fe4f8eff287f41ab4fa5e9c93
          startedTime: "2023-07-06T06:50:03Z"
          state: Completed
          verified: false
          version: 4.12.23
        observedGeneration: 3
        versionHash: 2Ov3Hgho7pc=
      Describing abnormal nodes... 

      also the same error in CVO logs

      namespaces/openshift-cluster-version/pods/cluster-version-operator-559546b954-brw7n/cluster-version-operator/cluster-version-operator/logs/current.log:2023-07-06T07:35:17.636736241Z I0706 07:35:17.636685       1 sync_worker.go:1013] Done syncing for clusterrolebinding "csi-snapshot-controller-runner-operator" (322 of 831)
      namespaces/openshift-cluster-version/pods/cluster-version-operator-559546b954-brw7n/cluster-version-operator/cluster-version-operator/logs/current.log:2023-07-06T09:32:58.498412536Z I0706 09:32:58.498387       1 sync_worker.go:993] Running sync for clusterrolebinding "csi-snapshot-controller-runner-operator" (322 of 831)
      namespaces/openshift-cluster-version/pods/cluster-version-operator-559546b954-brw7n/cluster-version-operator/cluster-version-operator/logs/current.log:2023-07-06T09:32:58.498562776Z I0706 09:32:58.498445       1 task_graph.go:546] Result of work: [update context deadline exceeded at 78 of 831 Could not update clusterrolebinding "csi-snapshot-controller-runner-operator" (322 of 831)]
      namespaces/openshift-cluster-version/pods/cluster-version-operator-559546b954-brw7n/cluster-version-operator/cluster-version-operator/logs/current.log:2023-07-06T09:32:58.498650896Z I0706 09:32:58.498643       1 sync_worker.go:1173] Update error 322 of 831: UpdatePayloadFailed Could not update clusterrolebinding "csi-snapshot-controller-runner-operator" (322 of 831) (context.deadlineExceededError: context deadline exceeded)
      namespaces/openshift-cluster-version/pods/cluster-version-operator-559546b954-brw7n/cluster-version-operator/cluster-version-operator/logs/current.log:2023-07-06T09:34:58.499473531Z E0706 09:34:58.499463       1 sync_worker.go:654] unable to synchronize image (waiting 2m14.424011973s): Could not update clusterrolebinding "csi-snapshot-controller-runner-operator" (322 of 831)
      namespaces/openshift-cluster-version/pods/cluster-version-operator-559546b954-brw7n/cluster-version-operator/cluster-version-operator/logs/current.log:2023-07-06T09:38

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      not sure

      Steps to Reproduce:

      1. upgrade from 4.12.23 arm64 to 4.13.4 arm64
      2.
      3.
      

      Actual results:

      Could not update clusterrolebinding "csi-snapshot-controller-runner-operator

      Expected results:

      no error

      Additional info:

       

            [OCPBUGS-15871] Could not update clusterrolebinding "csi-snapshot-controller-runner-operator" blocks 4.12.23 arm64 upgrade to 4.13.4 arm64

            Per the announcement sent regarding the removal of "Blocker" as an option in the Priority field, this issue (which was already closed at the time of the bulk update) had Priority = "Blocker." It is being updated to Priority = Critical. No additional fields were changed.

            OpenShift Jira Automation Bot added a comment - Per the announcement sent regarding the removal of "Blocker" as an option in the Priority field, this issue (which was already closed at the time of the bulk update) had Priority = "Blocker." It is being updated to Priority = Critical. No additional fields were changed.

            Johnny Liu added a comment -

            Thanks for the analysis, sounds like a ci job configuration issue, fixing it in https://github.com/openshift/release/pull/41043 .

            Johnny Liu added a comment - Thanks for the analysis, sounds like a ci job configuration issue, fixing it in https://github.com/openshift/release/pull/41043 .

            Junqi Zhao added a comment - - edited

            trking thanks, will let the prow job owner check the issue, "ReleaseAccepted=False"

              - lastTransitionTime: "2023-07-06T07:38:58Z"
                message: 'Retrieving payload failed version="" image="registry.build02.ci.openshift.org/ci-op-rsw9220d/release@sha256:13b14f0514d24d241d40ebacac9f15f93acebc4a7849e4740df49e65e48af424"
                  failure=The update cannot be verified: unable to verify sha256:13b14f0514d24d241d40ebacac9f15f93acebc4a7849e4740df49e65e48af424
                  against keyrings: verifier-public-key-redhat'
                reason: RetrievePayload
                status: "False"
                type: ReleaseAccepted
              - lastTransitionTime: "2023-07-06T07:24:30Z" 

            Junqi Zhao added a comment - - edited trking thanks, will let the prow job owner check the issue, "ReleaseAccepted=False" - lastTransitionTime: "2023-07-06T07:38:58Z" message: 'Retrieving payload failed version= "" image=" registry.build02.ci.openshift.org/ci-op-rsw9220d/release@sha256:13b14f0514d24d241d40ebacac9f15f93acebc4a7849e4740df49e65e48af424" failure=The update cannot be verified: unable to verify sha256:13b14f0514d24d241d40ebacac9f15f93acebc4a7849e4740df49e65e48af424 against keyrings: verifier- public -key-redhat' reason: RetrievePayload status: "False" type: ReleaseAccepted - lastTransitionTime: "2023-07-06T07:24:30Z"

            $ oc adm release info quay.io/openshift-release-dev/ocp-release@sha256:13b14f0514d24d241d40ebacac9f15f93acebc4a7849e4740df49e65e48af424
            Name:           4.13.4
            Digest:         sha256:13b14f0514d24d241d40ebacac9f15f93acebc4a7849e4740df49e65e48af424
            Created:        2023-06-15T11:35:55Z
            

            That's a signed release:

            $ curl -s https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/sha256=13b14f0514d24d241d40ebacac9f15f93acebc4a7849e4740df49e65e48af424/signature-1 | gpg --decrypt
            {"critical": {"image": {"docker-manifest-digest": "sha256:13b14f0514d24d241d40ebacac9f15f93acebc4a7849e4740df49e65e48af424"}, "type": "atomic container signature", "identity": {"docker-reference": "quay.io/openshift-release-dev/ocp-release:4.13.4-aarch64"}}, "optional": {"creator": "Red Hat OpenShift Signing Authority 0.0.1"}}gpg: Signature made Thu 15 Jun 2023 11:41:55 AM PDT
            gpg:                using RSA key 199E2F91FD431D51
            gpg: Good signature from "Red Hat, Inc. (release key 2) <security@redhat.com>" [unknown]
            gpg: WARNING: This key is not certified with a trusted signature!
            gpg:          There is no indication that the signature belongs to the owner.
            Primary key fingerprint: 567E 347A D004 4ADE 55BA  8A5F 199E 2F91 FD43 1D51
            

            But the disconnected-fullyprivate portion of the job name suggests that the CVO may not be able to reach mirror.openshift.com or storage.googleapis.com to retrieve the signature from the canonical stores listed in our firewall docs. So I expect you need to either set up the signature ConfigMaps as described here and in other mirroring docs, or use force: true to push through failed signature verification (although that can also push through other pre-update checks).

            W. Trevor King added a comment - $ oc adm release info quay.io/openshift-release-dev/ocp-release@sha256:13b14f0514d24d241d40ebacac9f15f93acebc4a7849e4740df49e65e48af424 Name:           4.13.4 Digest:         sha256:13b14f0514d24d241d40ebacac9f15f93acebc4a7849e4740df49e65e48af424 Created:        2023-06-15T11:35:55Z That's a signed release: $ curl -s https://mirror.openshift.com/pub/openshift-v4/signatures/openshift/release/sha256=13b14f0514d24d241d40ebacac9f15f93acebc4a7849e4740df49e65e48af424/signature-1 | gpg --decrypt {"critical": {"image": {"docker-manifest-digest": "sha256:13b14f0514d24d241d40ebacac9f15f93acebc4a7849e4740df49e65e48af424"}, "type": "atomic container signature", "identity": {"docker-reference": "quay.io/openshift-release-dev/ocp-release:4.13.4-aarch64"}}, "optional": {"creator": "Red Hat OpenShift Signing Authority 0.0.1"}}gpg: Signature made Thu 15 Jun 2023 11:41:55 AM PDT gpg: using RSA key 199E2F91FD431D51 gpg: Good signature from "Red Hat, Inc. (release key 2) <security@redhat.com>" [unknown] gpg: WARNING: This key is not certified with a trusted signature! gpg: There is no indication that the signature belongs to the owner. Primary key fingerprint: 567E 347A D004 4ADE 55BA 8A5F 199E 2F91 FD43 1D51 But the disconnected-fullyprivate portion of the job name suggests that the CVO may not be able to reach mirror.openshift.com or storage.googleapis.com to retrieve the signature from the canonical stores listed in our firewall docs . So I expect you need to either set up the signature ConfigMaps as described here and in other mirroring docs, or use force: true to push through failed signature verification (although that can also push through other pre-update checks).

            Junqi Zhao added a comment -

            trking 


            Whether you intended to force the update without a signature, or if the signature ConfigMap application somehow failed. This might be a bug in the QE test harness.


            if the build is signed, the QE prow job will upgrade without --force, otherwise, it will upgrade with --force, it is by design for the QE prow job

            Junqi Zhao added a comment - trking   Whether you intended to force the update without a signature, or if the signature ConfigMap application somehow failed. This might be a bug in the QE test harness. if the build is signed, the QE prow job will upgrade without --force, otherwise, it will upgrade with --force, it is by design for the QE prow job

            Ah, in fact I see the same context deadline exceeded in OCPBUGS-7714 under similar conditions, so closing this one as a dup.

            W. Trevor King added a comment - Ah, in fact I see the same context deadline exceeded in OCPBUGS-7714 under similar conditions, so closing this one as a dup.

            Sounds similar to OCPBUGS-7714. Might be a dup.

            W. Trevor King added a comment - Sounds similar to OCPBUGS-7714 . Might be a dup.

              lmohanty@redhat.com Lalatendu Mohanty
              juzhao@redhat.com Junqi Zhao
              Wei Duan Wei Duan
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: