Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-9685

fails to reconcile to RT kernel on interrupted updates

    • No
    • Approved
    • False
    • Hide

      None

      Show
      None

      The aggregated https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/aggregated-gcp-ovn-rt-upgrade-4.14-minor-release-openshift-release-analysis-aggregator/1633554110798106624 job failed.  Digging into one of them:

       

      This MCD log has https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.14-upgrade-from-stable-4.13-e2e-gcp-ovn-rt-upgrade/1633554106595414016/artifacts/e2e-gcp-ovn-rt-upgrade/gather-extra/artifacts/pods/openshift-machine-config-operator_machine-config-daemon-p2vf4_machine-config-daemon.log

       

      Deployments:
      * ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:4f28fbcd049025bab9719379492420f9eaab0426cdbbba43b395eb8421f10a17
                         Digest: sha256:4f28fbcd049025bab9719379492420f9eaab0426cdbbba43b395eb8421f10a17
                        Version: 413.86.202302230536-0 (2023-03-08T20:10:47Z)
            RemovedBasePackages: kernel-core kernel-modules kernel kernel-modules-extra 4.18.0-372.43.1.el8_6
                LayeredPackages: kernel-rt-core kernel-rt-kvm kernel-rt-modules
                                 kernel-rt-modules-extra
      ...
      E0308 22:11:21.925030 74176 writer.go:200] Marking Degraded due to: failed to update OS to quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cd299b2bf3cc98fb70907f152b4281633064fe33527b5d6a42ddc418ff00eec1 : error running rpm-ostree rebase --experimental ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:cd299b2bf3cc98fb70907f152b4281633064fe33527b5d6a42ddc418ff00eec1: error: Importing: remote error: fetching blob: received unexpected HTTP status: 500 Internal Server Error
      ... 
      I0308 22:11:36.959143   74176 update.go:2010] Running: rpm-ostree override reset kernel kernel-core kernel-modules kernel-modules-extra --uninstall kernel-rt-core --uninstall kernel-rt-kvm --uninstall kernel-rt-modules --uninstall kernel-rt-modules-extra
      ...
      E0308 22:12:35.525156   74176 writer.go:200] Marking Degraded due to: error running rpm-ostree override reset kernel kernel-core kernel-modules kernel-modules-extra --uninstall kernel-rt-core --uninstall kernel-rt-kvm --uninstall kernel-rt-modules --uninstall kernel-rt-modules-extra: error: Package/capability 'kernel-rt-core' is not currently requested
      : exit status 1
        

       

      Something is going wrong here in our retry loop.   I think it might be that we don't clear the pending deployment on failure.  IOW we need to

      rpm-ostree cleanup -p 

      before we rertry.

       

      This is fallout from https://github.com/openshift/machine-config-operator/pull/3580 - Although I suspect it may have been an issue before too.

       

            [OCPBUGS-9685] fails to reconcile to RT kernel on interrupted updates

            Colin Walters created issue -
            Colin Walters made changes -
            Link New: This issue relates to COS-1926 [ COS-1926 ]
            Colin Walters made changes -
            QA Contact New: Rio Liu [ JIRAUSER164038 ]
            OpenShift Prow Bot made changes -
            Remote Link New: This issue links to "openshift/machine-config-operator#3599: OCPBUGS-9685: daemon: Always remove pending deployment before we do updates (Web Link)" [ 1189833 ]
            Colin Walters made changes -
            Target Version Original: 4.13.0 [ 12398350 ] New: 4.14.0 [ 12402534 ]
            Colin Walters made changes -
            Assignee Original: Team MCO [ team-mco ] New: Colin Walters [ walters@redhat.com ]
            Colin Walters made changes -
            Priority Original: Undefined [ 10300 ] New: Blocker [ 1 ]
            OpenShift Jira Bot made changes -
            Release Blocker New: Proposed [ 25756 ]
            Colin Walters made changes -
            Status Original: New [ 10016 ] New: ASSIGNED [ 14452 ]
            Colin Walters made changes -
            Status Original: ASSIGNED [ 14452 ] New: POST [ 15726 ]
            Colin Walters made changes -
            Remote Link New: This issue links to "https://bugzilla.redhat.com/show_bug.cgi?id=2177088 (Web Link)" [ 1190120 ]
            OpenShift Prow Bot made changes -
            Status Original: POST [ 15726 ] New: MODIFIED [ 14454 ]
            OpenShift Prow Bot made changes -
            Link New: This issue is cloned by OCPBUGS-9951 [ OCPBUGS-9951 ]
            OpenShift Prow Bot made changes -
            Link New: This issue blocks OCPBUGS-9951 [ OCPBUGS-9951 ]
            ART Bot made changes -
            Status Original: MODIFIED [ 14454 ] New: ON_QA [ 15723 ]
            Sergio Regidor de la Rosa made changes -
            Status Original: ON_QA [ 15723 ] New: Verified [ 10015 ]
            Sinny Kumari made changes -
            Release Blocker Original: Proposed [ 25756 ] New: Approved [ 25755 ]
            Scott Dodson made changes -
            Resolution New: Done [ 1 ]
            Status Original: Verified [ 10015 ] New: Closed [ 6 ]
            OpenShift Release-Controller Bot made changes -
            Fix Version/s New: 4.14.0 [ 12402534 ]
            OpenShift Jira Automation Bot made changes -
            Priority Original: Blocker [ 1 ] New: Critical [ 2 ]

              walters@redhat.com Colin Walters
              walters@redhat.com Colin Walters
              Rio Liu Rio Liu
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: