-
Bug
-
Resolution: Unresolved
-
Critical
-
odf-4.12
-
None
Description of problem (please be detailed as possible and provide log
snippests):
After cluster wide reboot on cert auth. A ODF node reboot removed the DASD partition and they lost all 3 OSDs.
Customer followed this IBM documentation to partition the DASD.
> https://www.ibm.com/docs/en/linux-on-systems?topic=architecture-storage
> See Section “4.1.2 Steps specific for DASD devices”
ODF deployed successfully with LSO and the OSDs mapped to dasde1.
To use host binaries, run `chroot /host`
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop1 7:1 0 811.6G 0 loop
dasda 94:0 0 103.2G 0 disk
-dasda1 94:1 0 384M 0 part /host/boot `-dasda2 94:2 0 102.8G 0 part /host/sysroot dasde 94:16 0 811.6G 0 disk `-dasde1 94:17 0 811.6G 0 part |
After cluster wide reboot on cert auth.
MapVolume.EvalHostSymlinks failed for volume "local-pv-ef04e88d" : lstat /dev/disk/by-id/ccw-IBM.750000000KHF61.baee.40-part1: no such file or directory
Events log:
3m27s Warning FailedMapVolume pod/rook-ceph-osd-0-59c9db848-5rp9f MapVolume.EvalHostSymlinks failed for volume "local-pv-ef04e88d" : lstat /dev/disk/by-id/ccw-IBM.750000000KHF61.baee.40-part1: no such file or directory
23m Warning FailedMount pod/rook-ceph-osd-0-59c9db848-5rp9f (combined from similar events): Unable to attach or mount volumes: unmounted volumes=[ocs-deviceset-odf-cluster-storage-0-data-1wgxd7], unattached volumes=[ocs-deviceset-odf-cluster-storage-0-data-1wgxd7 ocs-deviceset-odf-cluster-storage-0-data-1wgxd7-bridge kube-api-access-25kpz rook-data rook-config-override rook-ceph-log rook-ceph-crash run-udev]: timed out waiting for the condition
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
dasda 94:0 0 103.2G 0 disk
-dasda1 94:1 0 384M 0 part /boot `-dasda2 94:2 0 102.8G 0 part /sysroot dasde 94:16 0 811.6G 0 disk |
Version of all relevant components (if applicable):
OCP/ODF 4.12
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Node reboot destroys the OSD path to dasde1
Is there any workaround available to the best of your knowledge?
No
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
5
Can this issue reproducible?
Yes, on reboot of the node.
Can this issue reproduce from the UI?
If this is a regression, please provide more details to justify this:
Steps to Reproduce:
1.
2.
3.
Actual results:
Expected results:
DSAD partition persists on reboot.
Additional info:
- external trackers