Executive summary
We need to know with strong confidence which shared libraries systemd is using, or can conceivably use on a running safety compliant RHIVOS 2/RHEL 10 based system. Because of the use of dlopen() we are unsure if we are assessing this correctly. We'd like to formulate an approach in collaboration with the systemd team that the team believes will accurately provide this information.
Long Details
The automotive functional safety team is continuing to refine our approach for defining which packages fall into the "core" safety scope. This includes both packages that have content used during the startup of a safe system and a smaller subset of these packages that remain active once the system is fully booted and potentially running safety applications. We'd like these lists to be as short as possible while remaining accurate.
We're doing this in two ways for RHEL 10/RHIVOS 2.0, both of which are implemented in the following git project:
https://gitlab.com/imcleod1/execopen
Mechanism 1 - eBPF trace of the booting system to determine the larger "core RPMs list" - With this technique, we load a simple tracing program very early in boot, using a replacement rdinit. We then launch the normal system startup by launching systemd in the initramfs. The tracing program captures all process execs, and all file access as well as all fork events. We use this information to produce an exhaustive list of all accessed files, and then map these files back to SRPMs and RPMs after the system has completed boot.
Mechanism 2 - /proc walk - With this technique, we walk the entire /proc filesystem after the system is known to have completed boot. For each user space process, we identify all files mapped into the process address space and then map those files back to RPMs and SRPMs. This gives us a confident view of all the object code that has the potential to affect system behavior.
rh-ee-mstorr pointed out that mechanism 2 assumes there is no ongoing loading of shared libraries after a process has been launched, and that we know from our systemd risk assessment that as of RHEL 10, systemd can in fact load shared libraries at runtime, and not just at process initialization time.
We'd like to find a way for our list of "used at runtime" packages to be accurate.
For example, can we confidently know for our safety image systemd configuration, when the systemd components are essentially "done" with any potential dlopen() activity. If we can, then the output from Mechanism 2 can be considered valid.
From what I can see so far, it's difficult to tell. The compression libraries, for example, read as if they are loaded only once the journal decides it needs to compress something. That seems like something that could require quite a bit of runtime before it happens.
Another baseline set of question would be: Have we adjusted the SPEC files to make it possible to install systemd sub-RPMs without requiring that all conceivable dlopen() sourced libraries are also included in the RPM transaction? That is, did we use the dlopen() change to also make it possible to better minimize container or system images by allowing systemd to install with a smaller dependency set? Or, put yet a third way, is it possible to have a valid RHEL 10 systemd install and have a dlopen() fail because of a missing library?