-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
rhel-10.1, rhel-9.7
-
None
-
None
-
None
-
rhel-sst-cee-supportability
-
None
-
False
-
False
-
-
None
-
None
-
None
-
None
-
Unspecified
-
Unspecified
-
Unspecified
-
-
x86_64
-
None
What were you trying to do that didn't work?
Sporadically, `sos clean` can hit a live-lock during initiation of child cleaner processes.
What is the impact of this issue to you?
sos clean stuck forever, spinning CPUs.
Please provide the package NVR for which the bug is seen:
sos-4.10.2
How reproducible is this bug?:
rarely
Steps to reproduce
- have big default_mapping and/or cleaner_cache for leaner
- sos clean --batch <sosreport-dir>
Expected results
sos clean finishes.
Actual results
stuck for many hours, like:
# ps aux | grep sos root 31407 8.6 0.1 198400 51016 ? Sl Feb03 108:24 python3 bin/sos clean --batch sosreport-HOSTNAME-2025-12-09-azukuff root 31641 98.8 0.2 151592 67688 ? R 00:48 1131:38 python3 bin/sos clean --batch sosreport-HOSTNAME-2025-12-09-azukuff root 31642 98.8 0.1 134252 55052 ? R 00:48 1131:33 python3 bin/sos clean --batch sosreport-HOSTNAME-2025-12-09-azukuff root 31643 98.8 0.1 136256 57072 ? R 00:48 1131:33 python3 bin/sos clean --batch sosreport-HOSTNAME-2025-12-09-azukuff root 31644 12.0 0.1 138024 57696 ? S 00:48 137:55 python3 bin/sos clean --batch sosreport-HOSTNAME-2025-12-09-azukuff #
while `gdb` shows backtraces like:
Core was generated by `/usr/bin/python3 bin/sos clean --batch sosreport-nz11rsat001v-2025-12-09-azukuff'. #0 __GI___fstatat64 (fd=-100, file=0x7f7e1e375aa0 "/etc/sos/cleaner/cleaner_cache/sosipv6map/532", buf=0x7ffd1b986900, flag=0) at ../sysdeps/unix/sysv/linux/fstatat64.c:162 162 return INTERNAL_SYSCALL_ERROR_P (r) (gdb) py-bt Traceback (most recent call first): <built-in method stat of module object at remote 0x7f7e212c21d0> File "/usr/lib64/python3.9/genericpath.py", line 30, in isfile st = os.stat(path) File "/root/sos-main-ORIG/sos/cleaner/mappings/__init__.py", line 90, in load_new_entries_from_dir while os.path.isfile(fname): File "/root/sos-main-ORIG/sos/cleaner/mappings/__init__.py", line 126, in add self.load_new_entries_from_dir(counter) File "/root/sos-main-ORIG/sos/cleaner/mappings/__init__.py", line 190, in get return self.add(item) File "/root/sos-main-ORIG/sos/cleaner/parsers/__init__.py", line 139, in _parse_line new_match = self.mapping.get(match) File "/root/sos-main-ORIG/sos/cleaner/parsers/__init__.py", line 98, in parse_line line, _count = self._parse_line(line) File "/root/sos-main-ORIG/sos/cleaner/archives/__init__.py", line 149, in obfuscate_line line, _count = parser.parse_line(line) File "/root/sos-main-ORIG/sos/cleaner/archives/__init__.py", line 212, in obfuscate_arc_files line, cnt = self.obfuscate_line(line, _parsers) File "/root/sos-main-ORIG/sos/cleaner/__init__.py", line 43, in obfuscate_arc_files return arc.obfuscate_arc_files(flist) File "/usr/lib64/python3.9/concurrent/futures/process.py", line 205, in <listcomp> return [fn(*args) for args in chunk] File "/usr/lib64/python3.9/concurrent/futures/process.py", line 205, in _process_chunk return [fn(*args) for args in chunk] File "/usr/lib64/python3.9/concurrent/futures/process.py", line 246, in _process_worker r = call_item.fn(*call_item.args, **call_item.kwargs) File "/usr/lib64/python3.9/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/usr/lib64/python3.9/multiprocessing/process.py", line 315, in _bootstrap self.run() File "/usr/lib64/python3.9/multiprocessing/popen_fork.py", line 71, in _launch code = process_obj._bootstrap(parent_sentinel=child_r) File "/usr/lib64/python3.9/multiprocessing/popen_fork.py", line 19, in __init__ self._launch(process_obj) File "/usr/lib64/python3.9/multiprocessing/context.py", line 277, in _Popen
Weird observation:
The process tries to touch /etc/sos/cleaner/cleaner_cache/sosipv6map/532 . But the dir had many IDs until that free 532:
# ll -tr /etc/sos/cleaner/cleaner_cache/sosipv6map/* | tail -rw-------. 1 root root 27 Feb 3 22:59 521 -rw-------. 1 root root 5 Feb 3 22:59 /etc/sos/cleaner/cleaner_cache/sosipv6map/522 -rw-------. 1 root root 24 Feb 3 22:59 /etc/sos/cleaner/cleaner_cache/sosipv6map/523 -rw-------. 1 root root 17 Feb 3 22:59 /etc/sos/cleaner/cleaner_cache/sosipv6map/524 -rw-------. 1 root root 9 Feb 3 22:59 /etc/sos/cleaner/cleaner_cache/sosipv6map/525 -rw-------. 1 root root 39 Feb 3 22:59 /etc/sos/cleaner/cleaner_cache/sosipv6map/526 -rw-------. 1 root root 27 Feb 3 22:59 /etc/sos/cleaner/cleaner_cache/sosipv6map/527 -rw-------. 1 root root 24 Feb 3 22:59 /etc/sos/cleaner/cleaner_cache/sosipv6map/528 -rw-------. 1 root root 17 Feb 3 22:59 /etc/sos/cleaner/cleaner_cache/sosipv6map/529 -rw-------. 1 root root 12 Feb 3 22:59 /etc/sos/cleaner/cleaner_cache/sosipv6map/530 -rw-------. 1 root root 11 Feb 3 22:59 /etc/sos/cleaner/cleaner_cache/sosipv6map/531 #
So it seems like three child processes race for a free mapper ID to assign there their IPv6 address to cleanup (via creating symlink there), and they are in a live-lock for finding such free ID.
How it happened:
- have some bigger cleaner_cache
- run cleaner on some report with new hostnames/IPs/.. to clean
- `sos/cleaner/mappings/_init_.py, line 126, in add` reaches "I failed to create symlink for my new entry, as file 531 exists
- so `sos/cleaner/mappings/_init_.py, line 80-100`, `load_new_entries_from_dir` is executed, trying to load new entries to local dataset
- that adds just the content of 531 file (newer files dont exist). counter should be set to `counter = len(self.dataset) + 1` but apparently that is still 531. Like the content of 531 file already was in the dataset?
- The 531 has content `255 ::1/128` what is a weird IPv6 address, but probably it was already in the dataset..?
- potential fix: change:
counter = len(self.dataset) + 1
to:
counter = max(len(self.dataset), counter) + 1
(while `counter = 0` added at the beginning of the method)