-
Bug
-
Resolution: Done
-
Major
-
None
-
4.13
-
Important
-
No
-
3
-
Sprint 234 - Team OSInt, Sprint 235
-
2
-
Approved
-
False
-
Description of problem:
I suspect this is an underlying RHEL kernel bug/change, but so far I have only tested on OCP/RHCOS, will update if I can confirm... Upgrading to 9.2 using 4.13-rc.2 / 4.13.0-0.nightly-2023-03-29-235439 on a baremetal cluster was stuck on my 2nd worker node, I discovered it was due to OSD PDBs preventing further progress because booting into the new kernel caused a disk rename: 8.6 was: /dev/disk/by-id/nvme-Dell_Express_Flash_PM1725a_3.2TB_AIC__S3B1NA0JC00067 9.2 now: /dev/disk/by-id/nvme-Dell_Express_Flash_PM1725a_3.2TB_AIC_______S3B1NA0JC00067 I am using LSO autodiscovery so did not hardcode my disk by-ids at install time. I have other HDDs in the system not used by LSO/ODF and do not see renames to the by-ids for those (only changes to symbolic sdX naming links), so it may be specific to NVMe by-id naming.
Version-Release number of selected component (if applicable):
4.13.0-rc.2 5.14.0-284.4.1.el9_2.x86_64
How reproducible:
The by-id renaming happened for all 4 of my workers w/ NVMes
Steps to Reproduce:
1. Install 8.6-based OCP + LSO + ODF 2. Upgrade to 9.2-based OCP 3. Check OSD pods stuck in Init: Warning FailedMapVolume <invalid> (x6 over 0s) kubelet MapVolume.EvalHostSymlinks failed for volume "local-pv-4a847404" : lstat /dev/disk/by-id/nvme-Dell_Express_Flash_PM1725a_3.2TB_AIC__S3B1NA0JC00067: no such file or directory
Actual results:
Upgrade stalled, could recover by manually deleting storage PDBs, but LSO & StorageCluster needs to be reinstalled
Expected results:
Upgrade to new kernel does not disrupt storage
Additional info:
ex. vim diff output, 8.6 on the left 9.2 on the right:
lrwxrwxrwx. 1 root root nvme-Dell_Express_Flash_PM1725a_3.2TB_AIC__S3B1NA0JC00084 -> ../../nvme0n1 | lrwxrwxrwx. 1 root root nvme-Dell_Express_Flash_PM1725a_3.2TB_AIC_______S3B1NA0JC00084 -> ../../nvme0n1 lrwxrwxrwx. 1 root root nvme-eui.334231304ac000840025384100000002 -> ../../nvme0n1 | lrwxrwxrwx. 1 root root nvme-eui.334231304ac000840025384100000002 -> ../../nvme0n1 lrwxrwxrwx. 1 root root scsi-36d09466073c253002300be27de2fb838 -> ../../sda | lrwxrwxrwx. 1 root root scsi-36d09466073c253002300be27de2fb838 -> ../../sdc
- blocks
-
OCPBUGS-11485 [4.13] NVMe disk by-id rename breaks LSO/ODF
- Closed
- is cloned by
-
OCPBUGS-11485 [4.13] NVMe disk by-id rename breaks LSO/ODF
- Closed
- links to