Loading...

Linking RHIVOS CVEs to...

Migration: Automation ...

SWIFT: POC Conversion

Sync from "Extern...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: rhel-9.6
Affects Version/s: rhel-9.6
Component/s: 389-ds-base
Labels:
None

Fixed in Build:
389-ds-base-2.6.1-2.el9
Regression:
Yes
Severity:
Important

AssignedTeam:
rhel-idm-ds
Sub-System Group:

ssg_idm

Internal Target Milestone:
26
Story Points:
0
Blocked:
False
Ready:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
No
Products:

Red Hat Enterprise Linux
Sprint:
None

Acceptance Criteria:
Hide

Automated tests should pass:

dirsrvtests/tests/suites/logging/log_flush_rotation_test.py::test_log_flush_and_rotation_crash
Show
Automated tests should pass: dirsrvtests/tests/suites/logging/log_flush_rotation_test.py::test_log_flush_and_rotation_crash
Preliminary Testing:
Pass
Errata Link:
https://errata.engineering.redhat.com/advisory/144130
Test Coverage:

Automated

Release Note Type:
Unspecified Release Note Type - Unknown

Experience:
Architecture:

x86_64

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

What were you trying to do that didn't work?

For testing Cockpit, we run the quay.io/freeipa/freeipa-server:centos-9-stream container (on a Fedora CoreOS host, but that hopefully shouldn't matter). That VM gets refreshed every month.

That most recently happened in https://github.com/cockpit-project/bots/pull/7342 4 days ago. At the time of the refresh, all the test were fine. But two days later, all our FreeIPA related tests started to fail and we reverted to the previous image. That PR has quite a bit of debugging investigation. At that time I suspected a mis-build of the data directory volume, but couldn't reproduce it using a fresh container build.

Then in https://github.com/cockpit-project/bots/pull/7351 I reattempted another VM/container refresh, and once again it had worked for two days, until this morning everything
starts to fail again.

We get errors like "ipa: ERROR: Failed to authenticate to CA REST API" and the journal shows a crash of ns-slapd:

kernel: ns-slapd[1813]: segfault at aaaaaac2 ip 00007fa36504f367 sp 00007fa3501f45a8 error 4 in libnspr4.so[f367,7fa36504c000+25000] likely on CPU 0 (core 0, socket 0)
systemd-coredump[2282]: Process 1800 (ns-slapd) of user 389 dumped core.

Unfortunately the stack trace is useless, as the crash happens in a container.

What is the impact of this issue to you?

FreeIPA deployment stops working after about two days.

Please provide the package NVR for which the bug is seen:

https://github.com/cockpit-project/bots/pull/7350#issuecomment-2615093896 has a complete rpm -qa diff between the previously working and failing image. Given that it's ns-slapd that crashes, the biggest suspect is

-389-ds-base-2.5.2-2.el9.x86_64
-389-ds-base-libs-2.5.2-2.el9.x86_64
+389-ds-base-2.6.0-2.el9.x86_64
+389-ds-base-libs-2.6.0-2.el9.x86_64

How reproducible is this bug?

Always

Steps to reproduce

This is what our infra does, minus two handfuls of port redirections (which are of course critical for actually using it, but not important for reproducing the bug). During building the VM image which hosts the FreeIPA container, it does this:

mkdir -p /var/lib/ipa-data
podman run -it --rm --name freeipa -h f0.cockpit.lan -v /sys/fs/cgroup:/sys/fs/cgroup:ro -v /var/lib/ipa-data:/data:Z -e IPA_SERVER_IP=10.111.112.100 quay.io/freeipa/freeipa-server:centos-9-stream -U -p foobarfoo -a foobarfoo -n cockpit.lan -r COCKPIT.LAN --setup-dns --no-forwarders --no-ntp

Wait about 8 minutes until "Configure IPA server upon the first start" is done. Then do some more setup in

podman exec -it freeipa bash

and in the container, run:

echo foobarfoo | kinit admin@COCKPIT.LAN
ipa pwpolicy-mod --minlife=0 --maxlife=1000
# Change password to apply new password policy
printf "foobarfoo\nfoobarfoo\n" | ipa user-mod --password admin
# Allow "admins" IPA group members to run sudo
ipa-advise enable-admins-sudo | sh -ex
ipa dnsconfig-mod --forwarder=8.8.8.8
poweroff

(I don't know how much of this is necessary to reproduce the bug).

Now you can re-start the container using the same podman command. As the data dir/volume is initialized, it only takes some 10 to 20 seconds until "FreeIPA server started" appears, and the container works.

Now wait for two days (you can fast-forward the system clock by 3 days, see below). After that, starting the container will soon trigger the ns-slapd crash:

# podman exec -it freeipa systemctl --failed
  UNIT                       LOAD   ACTIVE SUB    DESCRIPTION                      
● dirsrv@COCKPIT-LAN.service loaded failed failed 389 Directory Server COCKPIT-LAN.

# podman exec -it freeipa systemctl status dirsrv@COCKPIT-LAN.service
× dirsrv@COCKPIT-LAN.service - 389 Directory Server COCKPIT-LAN.
     Loaded: loaded (/usr/lib/systemd/system/dirsrv@.service; enabled; preset: disabled)
    Drop-In: /usr/lib/systemd/system/dirsrv@.service.d
             └─custom.conf
             /data/etc/systemd/system/dirsrv@COCKPIT-LAN.service.d
             └─ipa-env.conf
     Active: failed (Result: core-dump) since Wed 2025-01-29 07:49:28 UTC; 57s ago
   Duration: 31.966s
    Process: 148 ExecStartPre=/usr/libexec/dirsrv/ds_systemd_ask_password_acl /etc/dirsrv/slapd-COCKPIT-LAN/dse.ldif (code=exited, status=0/SUCCESS)
    Process: 153 ExecStartPre=/usr/libexec/dirsrv/ds_selinux_restorecon.sh /etc/dirsrv/slapd-COCKPIT-LAN/dse.ldif (code=exited, status=0/SUCCESS)
    Process: 158 ExecStart=/usr/sbin/ns-slapd -D /etc/dirsrv/slapd-COCKPIT-LAN -i /run/dirsrv/slapd-COCKPIT-LAN.pid (code=dumped, signal=SEGV)
   Main PID: 158 (code=dumped, signal=SEGV)
     Status: "slapd started: Ready to process requests"
        CPU: 2.170s

Jan 29 07:48:56 f0.cockpit.lan ns-slapd[158]: [29/Jan/2025:07:48:56.107940754 +0000] - INFO - slapd_daemon - slapd started.  Listening on All Interfaces port 389 for LDAP requests
Jan 29 07:48:56 f0.cockpit.lan ns-slapd[158]: [29/Jan/2025:07:48:56.131173566 +0000] - INFO - slapd_daemon - Listening on All Interfaces port 636 for LDAPS requests
Jan 29 07:48:56 f0.cockpit.lan ns-slapd[158]: [29/Jan/2025:07:48:56.135474608 +0000] - INFO - slapd_daemon - Listening on /run/slapd-COCKPIT-LAN.socket for LDAPI requests
Jan 29 07:48:56 f0.cockpit.lan systemd[1]: Started 389 Directory Server COCKPIT-LAN..
Jan 29 07:49:00 f0.cockpit.lan ns-slapd[158]: [29/Jan/2025:07:49:00.939879499 +0000] - ERR - schema-compat-plugin - warning: no entries set up under cn=ng, cn=compat,dc=cockpit,dc=lan
Jan 29 07:49:00 f0.cockpit.lan ns-slapd[158]: [29/Jan/2025:07:49:00.947313349 +0000] - ERR - schema-compat-plugin - warning: no entries set up under cn=computers, cn=compat,dc=cockpit,dc=lan
Jan 29 07:49:00 f0.cockpit.lan ns-slapd[158]: [29/Jan/2025:07:49:00.948034746 +0000] - ERR - schema-compat-plugin - Finished plugin initialization.
Jan 29 07:49:28 f0.cockpit.lan systemd[1]: dirsrv@COCKPIT-LAN.service: Main process exited, code=dumped, status=11/SEGV
Jan 29 07:49:28 f0.cockpit.lan systemd[1]: dirsrv@COCKPIT-LAN.service: Failed with result 'core-dump'.
Jan 29 07:49:28 f0.cockpit.lan systemd[1]: dirsrv@COCKPIT-LAN.service: Consumed 2.170s CPU time.

and the crash is visible in the journal. This happens at different times, sometimes it takes a minute or three. Either it already crashes during container setup, or when trying to talk to it:

# podman exec -it freeipa sh -exc 'echo foobarfoo | kinit -f admin; ipa user-find'
+ echo foobarfoo
+ kinit -f admin
kinit: Generic error (see e-text) while getting initial credentials

To avoid having to wait for two days, or having to suffer through the 8 mins of first-time initialization, I attached a tarball of /var/lib/ipa-data here. You can unpack it with

tar -C /var/lib -xvf /tmp/ipa-data.tar.xz

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

ipa-data.tar.xz
8.09 MB
2025/01/29 8:22 AM

links to

389-ds-base/issues/6489

RHBA-2024:144130 389-ds-base bug fix and enhancement update

Assignee:: Mark Reynolds

Reporter:: Martin Pitt

Developer:: IdM DS Dev

QA Contact:: Viktor Ashirov

Doc Contact:: Evgenia Martyniuk

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2025/01/29 8:25 AM

Updated:: 2025/05/13 12:47 PM

Resolved:: 2025/05/13 12:47 PM

Target end:: 2025/02/17

Next Planned Release Date:: 2025/05/13

Release Date:: 2025/05/13

Details

Description

What were you trying to do that didn't work?

What is the impact of this issue to you?

Please provide the package NVR for which the bug is seen:

How reproducible is this bug?

Steps to reproduce

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide