Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-130958

system crashes if a rogue library prevents PID1 from re-executing

Linking RHIVOS CVEs to...Migration: Automation ...RHELPRIO AssignedTeam ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • rhel-8.10, rhel-10.1, rhel-9.7
    • systemd
    • None
    • None
    • Important
    • rhel-systemd
    • None
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Unspecified
    • Unspecified
    • Unspecified
    • None

      What were you trying to do that didn't work?

      A customer hits a kernel panic when updating the system due to PID 1 crashing in the linker phase, before main() executes.
      This results in getting file system corruption because most of the updated files didn't reach persistent storage yet but are just in the file system cache, making the system unusable and hard to recover.

      After much digging, it was found that the crash was due to having a rogue symlink /usr/lib64/libcrypto.so.10 pointing to an older libcrypto library, causing updated systemd to crash in linker phase, due to not being able to resolve symbols.

      Since PID 1 is very important and crashes are just not acceptable, it needs to be hardened to survive as much as possible.
      To do so, I'm proposing that before reexecuting, a test is made through spawning systemd as a child and verifying that the child returns 0.
      If it appears the child fails to execute, then systemd should cancel the reexec to avoid crashing the system.

      I'm attaching a prototype, which for now has several caveats (see the comments in the patch).

      What is the impact of this issue to you?

      Customer system breaks when updating many packages

      Please provide the package NVR for which the bug is seen:

      All systemd releases including Upstream.

      How reproducible is this bug?

      Always through using the minimal reproducer below (not mimicing customer's real case, just for demonstration purposes)

      Steps to reproduce

      1. Rename /usr/lib64/libcrypto.so.3 into /usr/lib64/libcrypto.so.3.orig
        # mv /usr/lib64/libcrypto.so.3 /usr/lib64/libcrypto.so.3.orig
        # Send TERM to PID1
         {code:java}# kill -TERM 1

      Expected results

      No panic

      Actual results

      Panic:

      [   20.090735] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f00
      [   20.091281] CPU: 3 PID: 1 Comm: systemd Kdump: loaded Not tainted 5.14.0-570.64.1.el9_6.x86_64 #1
      [   20.091874] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.17.0-7.fc42 06/10/2025
      ...
      

      Additional infos

      To recover the crashed system (if the "mv" made it to persistent storage, which is usually not the case), boot with init=/bin/sh on the kernel command line and restore the symlink:

      # mount -o rw,remount /
      # mv /usr/lib64/libcrypto.so.3.orig /usr/lib64/libcrypto.so.3
      # exec /usr/lib/systemd/systemd
      

              systemd-maint systemd maint mailing list
              rhn-support-rmetrich Renaud Métrich
              systemd maint mailing list systemd maint mailing list
              Frantisek Sumsal Frantisek Sumsal
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: