-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
rhel-8.6.0, rhel-8.8.0, rhel-8.10
-
None
-
No
-
Moderate
-
rhel-systemd
-
ssg_core_services
-
5
-
False
-
False
-
-
None
-
Red Hat Enterprise Linux
-
None
-
None
-
None
-
Unspecified
-
Unspecified
-
Unspecified
-
None
Some customers using the persistent journal see rsyslogd returning the following message just after the journals rotated:
Mar 26 14:12:32 ... rsyslogd[917]: imjournal: sd_journal_next() failed: Bad message [v8.2102.0-15.el8_10.1]
This is due to a race when reading/rotating the journal, the race occurs randomly, usually after 100-500 rotations.
This leads rsyslogd to re-read all the journal from the beginning in order to not lose messages, leading to having duplicates in /var/log/messages and such.
I could reproduce the issue on 8.6, 8.8, 8.9 and 8.10, including latest systemd systemd-239-82.el8_10.4.x86_64 and latest rsyslog rsyslog-8.2102.0-15.el8_10.1.x86_64.
Reproducer
- Configure a QEMU/KVM with 4GB memory and 4 CPUs
- Configure the persistent journal and make it rotate quickly and rsyslog to not drop messages
# cat /etc/systemd/journald.conf [Journal] RateLimitIntervalSec=0 RateLimitBurst=0 SystemMaxFiles=30 SystemMaxUse=200M # cat /etc/rsyslog.conf [...] module(load="imjournal" ratelimit.interval="0" ratelimit.burst="0" StateFile="imjournal.state") [...]
- Create 20 users
# for i in $(seq 1000 1019); do useradd user$i; echo redhat | passwd --stdin user$i; done
- Make sure root can ssh to the users password-less
- Make the users generate logs through ssh'ing locally in loop
# for i in $(seq 1000 1019); do while :; do ssh user$i@localhost true </dev/null; done &
- Monitor the journal for rsyslogd
# journalctl --follow -u rsyslog [...] Mar 26 14:12:32 ... rsyslogd[917]: imjournal: journal files changed, reloading... [v8.2102.0-15.el8_10.1 try https://www.rsyslog.com/e/0 ] Mar 26 14:12:32 ... rsyslogd[917]: imjournal: journal files changed, reloading... [v8.2102.0-15.el8_10.1 try https://www.rsyslog.com/e/0 ] Mar 26 14:12:32 ... rsyslogd[917]: imjournal: sd_journal_next() failed: Bad message [v8.2102.0-15.el8_10.1] Mar 26 14:12:32 ... rsyslogd[917]: imjournal: sd_journal_next() failed: Bad message [v8.2102.0-15.el8_10.1] Mar 26 14:12:32 ... rsyslogd[917]: imjournal: trying to recover from journal error [v8.2102.0-15.el8_10.1 try https://www.rsyslog.com/e/0 ]