Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-25951

IBU with no rhcos delta results in ostree corruption

XMLWordPrintable

    • Important
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

         IBU from release X to Y, where X and Y have the same rhcos base image, results in corrupted ostree with rpm-ostree status failure.
      
      As part of the IBU prep to upgrade from X to Y:
      - Create new stateroot
      - Import repo data from seed image via ostree pull-local command
      - Create deployment in new stateroot
      
      In this scenario, where the base rhcos image is the same and we're installing the rt-kernel, we have 4 commits in the repo:
      - X.rt (rhcos+rt-kernel)
      - X.parent (rhcos)
      - Y.rt (rhcos+rt-kernel)
      - Y.parent (rhcos)
      
      Each has a unique commit ID, as the Y commits were generated on the seed SNO. The checksums for X.parent and Y.parent, however, are identical, as it is the same base rhcos image.
      
      When we create the new deployment with commit Y.rt in the new stateroot, a "prune" is automatically run. Unfortunately, the X.parent commit is getting pruned, as both X.parent and Y.parent refer to the same checksum. However, X.rt refers to X.parent as its parent. Once pruned, this causes rpm-ostree status to fail, reporting "No such metadata object". This seems to be unrecoverable.
      
      I tried using the --no-prune option on the "ostree admin deploy" command. This allowed rpm-ostree to be healthy, until rebooting to continue the upgrade. At some point over the upgrade, the X.parent ended up getting pruned anyway.
      
      Prior to reboot, using the --no-prune option, I see the following (using the 4.14 1221 and 1222 nightlies):
      
      [root@cnfdf01 core]# ostree admin status
      * rhcos bdea3d22314302d2e487c7c7a8557d2a6c5b542c34ef38293e195e6154ad1f62.0
          origin: <unknown origin type>
      [root@cnfdf01 core]# ostree admin status
      * rhcos bdea3d22314302d2e487c7c7a8557d2a6c5b542c34ef38293e195e6154ad1f62.0
          origin: <unknown origin type>
        rhcos_4.14.0_0.nightly_2023_12_22_053212 cbc96d9976d77f4360ddaf323abcf0c969b464cfc9569a59ef7c59bfa83d48cd.0
          origin: <unknown origin type>
      [root@cnfdf01 core]# ostree show bdea3d22314302d2e487c7c7a8557d2a6c5b542c34ef38293e195e6154ad1f62
      commit bdea3d22314302d2e487c7c7a8557d2a6c5b542c34ef38293e195e6154ad1f62
      Parent:  6f2aec2f30a84e114def89b5b348e01af85b69703c3f8a57a098d6243eeb3fe3
      ContentChecksum:  d67b438e99a5f54500e4248c4834d05301af18abf224b1f646248c9fe25b16c9
      Date:  2024-01-02 17:25:41 +0000
      (no subject)[root@cnfdf01 core]# ostree show 6f2aec2f30a84e114def89b5b348e01af85b69703c3f8a57a098d6243eeb3fe3
      commit 6f2aec2f30a84e114def89b5b348e01af85b69703c3f8a57a098d6243eeb3fe3
      ContentChecksum:  d95046e96d6256a924f1324dc9b0e6c4eedb534d02d0cf532bc32146894686d4
      Date:  2024-01-02 16:30:40 +0000
      (no subject)[root@cnfdf01 core]# ostree show cbc96d9976d77f4360ddaf323abcf0c969b464cfc9569a59ef7c59bfa83d48cd
      commit cbc96d9976d77f4360ddaf323abcf0c969b464cfc9569a59ef7c59bfa83d48cd
      Parent:  1f6f32972385956b2cf4df474b3cb098baf7d7409def563573ac802161eac2a5
      ContentChecksum:  5868c071113346ec8939b0b9378efd201f3df7dde86b70aa2d7f1e4599778c91
      Date:  2024-01-02 14:17:13 +0000
      (no subject)[root@cnfdf01 core]# ostree show 1f6f32972385956b2cf4df474b3cb098baf7d7409def563573ac802161eac2a5
      commit 1f6f32972385956b2cf4df474b3cb098baf7d7409def563573ac802161eac2a5
      ContentChecksum:  d95046e96d6256a924f1324dc9b0e6c4eedb534d02d0cf532bc32146894686d4
      Date:  2024-01-02 13:12:54 +0000
      (no subject)[root@cnfdf01 core]# rpm-ostree status
      State: idle
      Deployments:
      ● ostree-unverified-registry:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c23737c7c479780e144b8c4a6c40a2fae33a74de189e71d46908d0d7addf62a0
                         Digest: sha256:c23737c7c479780e144b8c4a6c40a2fae33a74de189e71d46908d0d7addf62a0
                        Version: 414.92.202312191502-0 (2024-01-02T16:30:40Z)
                      StateRoot: rhcos
            RemovedBasePackages: kernel-modules-core kernel-modules-extra kernel-core kernel kernel-modules 5.14.0-284.45.1.el9_2
                LayeredPackages: kernel-rt-core kernel-rt-kvm kernel-rt-modules kernel-rt-modules-extra  cbc96d9976d77f4360ddaf323abcf0c969b464cfc9569a59ef7c59bfa83d48cd
                      Timestamp: 2024-01-02T13:12:54Z
                      StateRoot: rhcos_4.14.0_0.nightly_2023_12_22_053212
            RemovedBasePackages: kernel-modules-core kernel-modules-extra kernel-core kernel kernel-modules 5.14.0-284.45.1.el9_2
                LayeredPackages: kernel-rt-core kernel-rt-kvm kernel-rt-modules kernel-rt-modules-extra
      
      
      After pivoting to the new deployment (with ostree admin set-default) and rebooting, the rpm-ostreed.service fails during init:
      
      [root@cnfdf01 core]# rpm-ostree status
      Job for rpm-ostreed.service failed because the control process exited with error code.
      See "systemctl status rpm-ostreed.service" and "journalctl -xeu rpm-ostreed.service" for details.
      × rpm-ostreed.service - rpm-ostree System Management Daemon
           Loaded: loaded (/usr/lib/systemd/system/rpm-ostreed.service; static)
          Drop-In: /run/systemd/system/rpm-ostreed.service.d
                   └─bug2111817.conf
                   /etc/systemd/system/rpm-ostreed.service.d
                   └─mco-controlplane-nice.conf
           Active: failed (Result: exit-code) since Tue 2024-01-02 19:44:58 UTC; 15ms ago
             Docs: man:rpm-ostree(1)
          Process: 35724 ExecStart=rpm-ostree start-daemon (code=exited, status=1/FAILURE)
         Main PID: 35724 (code=exited, status=1/FAILURE)
           Status: "error: Couldn't start daemon: Error setting up sysroot: Reading deployment 1: No such metadata object 6f2aec2f30a84e114def89b5b348e01af85b69703c3f8a57a098d6243eeb3fe3.commit"
              CPU: 39msJan 02 19:44:58 cnfdf01.telco5gran.eng.rdu2.redhat.com systemd[1]: Starting rpm-ostree System Management Daemon...
      Jan 02 19:44:58 cnfdf01.telco5gran.eng.rdu2.redhat.com rpm-ostree[35724]: Reading config file '/etc/rpm-ostreed.conf'
      Jan 02 19:44:58 cnfdf01.telco5gran.eng.rdu2.redhat.com rpm-ostree[35724]: error: Couldn't start daemon: Error setting up sysroot: Reading deployment 1: No such metadata...fe3.commit
      Jan 02 19:44:58 cnfdf01.telco5gran.eng.rdu2.redhat.com systemd[1]: rpm-ostreed.service: Main process exited, code=exited, status=1/FAILURE
      Jan 02 19:44:58 cnfdf01.telco5gran.eng.rdu2.redhat.com systemd[1]: rpm-ostreed.service: Failed with result 'exit-code'.
      Jan 02 19:44:58 cnfdf01.telco5gran.eng.rdu2.redhat.com systemd[1]: Failed to start rpm-ostree System Management Daemon.
      Hint: Some lines were ellipsized, use -l to show in full.
      error: Loading sysroot: exit status: 1
      
      
      A similar error is seen without the --no-prune option, except that it happens after running the "ostree admin deploy" command.
       

      Version-Release number of selected component (if applicable):

          4.14

      How reproducible:

          Always

      Steps to Reproduce:

          1. Create new stateroot
          2. Import repodata with same rhcos image as parent, different commit IDs - ie. two parent commits with unique commit IDs but same checksum
          3. Create deployment in new stateroot using imported commit that has rhcos commit as parent
          

      Actual results:

          Original parent rhcos commit gets pruned, causing rpm-ostree failure

      Expected results:

          Original parent rhcos commit should not get pruned, as it is in use in the original stateroot

      Additional info:

          

            Unassigned Unassigned
            dpenney1@redhat.com Don Penney
            Michael Nguyen Michael Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: