What were you trying to do that didn't work?
The sg3_utils package includes a udev rule to create "/dev/disk/by-id" symlinks is dependent on a unique identifier being returned by the backing storage, and with some storage this value is not unique, leading to high resource utilization by udev (looping) at boot time, and during device discovery.
Please provide the package NVR for which bug is seen:
sg3_utils
How reproducible:
Always reproducible with specific hardware and a large number of LUN's. In this particular case the server has ~6000 3Par LUN's all returning the same value for SCSI_IDENT_SERIAL.
Steps to reproduce
- Attach a large number of LUN's to a Red Hat Enterprise Linux 8 server that return the same SCSI_IDENT_SERIAL value when VPD inquiries are made.
- Boot the server.
Expected results
We expect that if the SCSI_IDENT_SERIAL value is not unique, another value will be used, or the rule will be skipped.
Actual results
Below is from David Jeffery:
Problematic ID values from the 3PAR storage look to be creating a costly feedback loop with one of the udev rules. The log file JournalSystemd-udevd.log is flooded with messages like
hostname systemd-udevd[88447]: found 'b69:80' claiming '/run/udev/links/\x2fdisk\x2fby-id\x2fscsi-S3PARdata_VV_123456789'
Because all 5500 3PAR disk LUNs are reporting the same serial ID, 123456789. As a consequence, one of the by-id links for all the 3PAR disks ends up at the same value. This causes a feedback loop with the symlink handling where every previous device adds to the operational cost of future devices processed with the same link value. Each other device with the same link value causes additional filesystem accesses and udevd cost as it allocates udev device structures and compares values to handle link priority. Instead of the expected handful of devices to be compared to at most, each udev sequence may compare to thousands of devices.
The data in JournalSystemd-udevd.log is highly incomplete from millions of messages being suppressed, but the "claiming" messages dominate the logs in the non-suppressed time periods and after the initial discovery. Many udev tasks appear to log only "claiming" messages during the non-suppressed windows of time yet not enough to complete a cycle because of the heavy cost of the comparisons and the feedback loop. The problematic link appears to come from lib/udev/rules.d/63-scsi-sg3_symlink.rules:
# Select which identifier to use per default # 0: vpd page 0x80 identifier ENV{SCSI_IDENT_SERIAL}=="?*", ENV{DEVTYPE}=="disk", SYMLINK+="disk/by-id/scsi-S$env{SCSI_VENDOR}_$env{SCSI_MODEL}_$env{SCSI_IDENT_SERIAL}"
It's expecting the SCSI_IDENT_SERIAL value to be mostly unique but instead is identical for all the 3PAR disk LUNs. With the data in JournalSystemd-udevd.log so fractured and incomplete it is hard to say exactly how high the cost of this is, but it does look to be significant. I would be worth disabling this udev rule and seeing how much faster udev processing completes.
A significant improvement was noted after disabling the rule.