Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-9951

fails to reconcile to RT kernel on interrupted updates

    XMLWordPrintable

Details

    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None

    Description

      This is a clone of issue OCPBUGS-9685. The following is the description of the original issue:

      The aggregated https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/aggregated-gcp-ovn-rt-upgrade-4.14-minor-release-openshift-release-analysis-aggregator/1633554110798106624 job failed.  Digging into one of them:

       

      This MCD log has https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-gcp-ovn-rt-upgrade/1633554106595414016/artifacts/e2e-gcp-ovn-rt-upgrade/gather-extra/artifacts/pods/openshift-machine-config-operator_machine-config-daemon-p2vf4_machine-config-daemon.log

       

      Deployments:
      * ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4f28fbcd049025bab9719379492420f9eaab0426cdbbba43b395eb8421f10a17
                         Digest: sha256:4f28fbcd049025bab9719379492420f9eaab0426cdbbba43b395eb8421f10a17
                        Version: 413.86.202302230536-0 (2023-03-08T20:10:47Z)
            RemovedBasePackages: kernel-core kernel-modules kernel kernel-modules-extra 4.18.0-372.43.1.el8_6
                LayeredPackages: kernel-rt-core kernel-rt-kvm kernel-rt-modules
                                 kernel-rt-modules-extra
      ...
      E0308 22:11:21.925030 74176 writer.go:200] Marking Degraded due to: failed to update OS to quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cd299b2bf3cc98fb70907f152b4281633064fe33527b5d6a42ddc418ff00eec1 : error running rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cd299b2bf3cc98fb70907f152b4281633064fe33527b5d6a42ddc418ff00eec1: error: Importing: remote error: fetching blob: received unexpected HTTP status: 500 Internal Server Error
      ... 
      I0308 22:11:36.959143   74176 update.go:2010] Running: rpm-ostree override reset kernel kernel-core kernel-modules kernel-modules-extra --uninstall kernel-rt-core --uninstall kernel-rt-kvm --uninstall kernel-rt-modules --uninstall kernel-rt-modules-extra
      ...
      E0308 22:12:35.525156   74176 writer.go:200] Marking Degraded due to: error running rpm-ostree override reset kernel kernel-core kernel-modules kernel-modules-extra --uninstall kernel-rt-core --uninstall kernel-rt-kvm --uninstall kernel-rt-modules --uninstall kernel-rt-modules-extra: error: Package/capability 'kernel-rt-core' is not currently requested
      : exit status 1
        

       

      Something is going wrong here in our retry loop.   I think it might be that we don't clear the pending deployment on failure.  IOW we need to

      rpm-ostree cleanup -p 

      before we rertry.

       

      This is fallout from https://github.com/openshift/machine-config-operator/pull/3580 - Although I suspect it may have been an issue before too.

       

      Attachments

        Issue Links

          Activity

            People

              team-mco Team MCO
              openshift-crt-jira-prow OpenShift Prow Bot
              Rio Liu Rio Liu
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: