In RHEL-111165 we found that setfiles would run out of memory when asked to relabel lots of files (across any directories in the filesystem) because it builds a large hash table referencing each filename. This was fixed by adding the setfiles -A option.
However while testing this fix, we found a different bug in setfiles / glibc where it would run out of memory if there were a large number of files (more than ~ 6 million files) in a single directory in the filesystem.
I tracked this down to the glibc fts functions:
This is a bug, of sorts, in the glibc FTS functions. Using the attached program (ftstest.c) demonstrates it easily.
- Compile the program.
- Create a directory that has ~ 6 million files, all in the same directory. The contents of the files does not matter.
- Create a second directory somewhere else that also has ~ 6 million files, but spread these out across 6 subdirectories (so each directory has about 1 million files in it).
- Run the program twice, once on each of the above test directories.
In the first case where all files are in a single directory, peak memory usage will be about 1.9 GB.
In the second case where no subdirectory has more than a million files, peak memory usage is about 300 MB.
Edit: So the problem seems to be that fts_read keeps FTSENT entries around in memory while processing a whole directory, where it could free them early.
I further looked into the code, and the only way to fix this is to completely change how functions fts_read and fts_build work so they operate incrementally, instead of reading the whole directory at a time.
Since glibc FTS functions are used all over the place, and the proposed change would be very invasive, it's not realistic to fix this.
This bug is to track a documentation only change in virt-v2v.