-
Bug
-
Resolution: Done-Errata
-
Critical
-
rhel-9.6
-
None
-
389-ds-base-2.6.1-2.el9
-
Yes
-
Important
-
rhel-idm-ds
-
ssg_idm
-
26
-
0
-
False
-
False
-
-
No
-
Red Hat Enterprise Linux
-
None
-
-
Pass
-
Automated
-
Unspecified Release Note Type - Unknown
-
-
x86_64
-
None
What were you trying to do that didn't work?
For testing Cockpit, we run the quay.io/freeipa/freeipa-server:centos-9-stream container (on a Fedora CoreOS host, but that hopefully shouldn't matter). That VM gets refreshed every month.
That most recently happened in https://github.com/cockpit-project/bots/pull/7342 4 days ago. At the time of the refresh, all the test were fine. But two days later, all our FreeIPA related tests started to fail and we reverted to the previous image. That PR has quite a bit of debugging investigation. At that time I suspected a mis-build of the data directory volume, but couldn't reproduce it using a fresh container build.
Then in https://github.com/cockpit-project/bots/pull/7351 I reattempted another VM/container refresh, and once again it had worked for two days, until this morning everything
starts to fail again.
We get errors like "ipa: ERROR: Failed to authenticate to CA REST API" and the journal shows a crash of ns-slapd:
kernel: ns-slapd[1813]: segfault at aaaaaac2 ip 00007fa36504f367 sp 00007fa3501f45a8 error 4 in libnspr4.so[f367,7fa36504c000+25000] likely on CPU 0 (core 0, socket 0) systemd-coredump[2282]: Process 1800 (ns-slapd) of user 389 dumped core.
Unfortunately the stack trace is useless, as the crash happens in a container.
What is the impact of this issue to you?
FreeIPA deployment stops working after about two days.
Please provide the package NVR for which the bug is seen:
https://github.com/cockpit-project/bots/pull/7350#issuecomment-2615093896 has a complete rpm -qa diff between the previously working and failing image. Given that it's ns-slapd that crashes, the biggest suspect is
-389-ds-base-2.5.2-2.el9.x86_64 -389-ds-base-libs-2.5.2-2.el9.x86_64 +389-ds-base-2.6.0-2.el9.x86_64 +389-ds-base-libs-2.6.0-2.el9.x86_64
How reproducible is this bug?
Always
Steps to reproduce
This is what our infra does, minus two handfuls of port redirections (which are of course critical for actually using it, but not important for reproducing the bug). During building the VM image which hosts the FreeIPA container, it does this:
mkdir -p /var/lib/ipa-data podman run -it --rm --name freeipa -h f0.cockpit.lan -v /sys/fs/cgroup:/sys/fs/cgroup:ro -v /var/lib/ipa-data:/data:Z -e IPA_SERVER_IP=10.111.112.100 quay.io/freeipa/freeipa-server:centos-9-stream -U -p foobarfoo -a foobarfoo -n cockpit.lan -r COCKPIT.LAN --setup-dns --no-forwarders --no-ntp
Wait about 8 minutes until "Configure IPA server upon the first start" is done. Then do some more setup in
podman exec -it freeipa bash
and in the container, run:
echo foobarfoo | kinit admin@COCKPIT.LAN ipa pwpolicy-mod --minlife=0 --maxlife=1000 # Change password to apply new password policy printf "foobarfoo\nfoobarfoo\n" | ipa user-mod --password admin # Allow "admins" IPA group members to run sudo ipa-advise enable-admins-sudo | sh -ex ipa dnsconfig-mod --forwarder=8.8.8.8 poweroff
(I don't know how much of this is necessary to reproduce the bug).
Now you can re-start the container using the same podman command. As the data dir/volume is initialized, it only takes some 10 to 20 seconds until "FreeIPA server started" appears, and the container works.
Now wait for two days (you can fast-forward the system clock by 3 days, see below). After that, starting the container will soon trigger the ns-slapd crash:
# podman exec -it freeipa systemctl --failed UNIT LOAD ACTIVE SUB DESCRIPTION ● dirsrv@COCKPIT-LAN.service loaded failed failed 389 Directory Server COCKPIT-LAN.
# podman exec -it freeipa systemctl status dirsrv@COCKPIT-LAN.service × dirsrv@COCKPIT-LAN.service - 389 Directory Server COCKPIT-LAN. Loaded: loaded (/usr/lib/systemd/system/dirsrv@.service; enabled; preset: disabled) Drop-In: /usr/lib/systemd/system/dirsrv@.service.d └─custom.conf /data/etc/systemd/system/dirsrv@COCKPIT-LAN.service.d └─ipa-env.conf Active: failed (Result: core-dump) since Wed 2025-01-29 07:49:28 UTC; 57s ago Duration: 31.966s Process: 148 ExecStartPre=/usr/libexec/dirsrv/ds_systemd_ask_password_acl /etc/dirsrv/slapd-COCKPIT-LAN/dse.ldif (code=exited, status=0/SUCCESS) Process: 153 ExecStartPre=/usr/libexec/dirsrv/ds_selinux_restorecon.sh /etc/dirsrv/slapd-COCKPIT-LAN/dse.ldif (code=exited, status=0/SUCCESS) Process: 158 ExecStart=/usr/sbin/ns-slapd -D /etc/dirsrv/slapd-COCKPIT-LAN -i /run/dirsrv/slapd-COCKPIT-LAN.pid (code=dumped, signal=SEGV) Main PID: 158 (code=dumped, signal=SEGV) Status: "slapd started: Ready to process requests" CPU: 2.170s Jan 29 07:48:56 f0.cockpit.lan ns-slapd[158]: [29/Jan/2025:07:48:56.107940754 +0000] - INFO - slapd_daemon - slapd started. Listening on All Interfaces port 389 for LDAP requests Jan 29 07:48:56 f0.cockpit.lan ns-slapd[158]: [29/Jan/2025:07:48:56.131173566 +0000] - INFO - slapd_daemon - Listening on All Interfaces port 636 for LDAPS requests Jan 29 07:48:56 f0.cockpit.lan ns-slapd[158]: [29/Jan/2025:07:48:56.135474608 +0000] - INFO - slapd_daemon - Listening on /run/slapd-COCKPIT-LAN.socket for LDAPI requests Jan 29 07:48:56 f0.cockpit.lan systemd[1]: Started 389 Directory Server COCKPIT-LAN.. Jan 29 07:49:00 f0.cockpit.lan ns-slapd[158]: [29/Jan/2025:07:49:00.939879499 +0000] - ERR - schema-compat-plugin - warning: no entries set up under cn=ng, cn=compat,dc=cockpit,dc=lan Jan 29 07:49:00 f0.cockpit.lan ns-slapd[158]: [29/Jan/2025:07:49:00.947313349 +0000] - ERR - schema-compat-plugin - warning: no entries set up under cn=computers, cn=compat,dc=cockpit,dc=lan Jan 29 07:49:00 f0.cockpit.lan ns-slapd[158]: [29/Jan/2025:07:49:00.948034746 +0000] - ERR - schema-compat-plugin - Finished plugin initialization. Jan 29 07:49:28 f0.cockpit.lan systemd[1]: dirsrv@COCKPIT-LAN.service: Main process exited, code=dumped, status=11/SEGV Jan 29 07:49:28 f0.cockpit.lan systemd[1]: dirsrv@COCKPIT-LAN.service: Failed with result 'core-dump'. Jan 29 07:49:28 f0.cockpit.lan systemd[1]: dirsrv@COCKPIT-LAN.service: Consumed 2.170s CPU time.
and the crash is visible in the journal. This happens at different times, sometimes it takes a minute or three. Either it already crashes during container setup, or when trying to talk to it:
# podman exec -it freeipa sh -exc 'echo foobarfoo | kinit -f admin; ipa user-find' + echo foobarfoo + kinit -f admin kinit: Generic error (see e-text) while getting initial credentials
To avoid having to wait for two days, or having to suffer through the 8 mins of first-time initialization, I attached a tarball of /var/lib/ipa-data here. You can unpack it with
tar -C /var/lib -xvf /tmp/ipa-data.tar.xz
- links to
-
RHBA-2024:144130 389-ds-base bug fix and enhancement update