-
Bug
-
Resolution: Done-Errata
-
Critical
-
odf-4.18
-
None
-
False
-
-
False
-
Committed
-
?
-
?
-
4.18.7-3
-
Committed
-
Important
-
Proposed
-
None
Description of problem -
**
Upgrade fails for OpenShift/ODF clusters 4.17, where RDR is enabled and using multipath.
[client.admin] keyring = /var/lib/rook/openshift-storage/client.admin.keyring 2025-03-07 14:01:38.402172 I | cephcmd: destroying osd.0 and cleaning its backing device 2025-03-07 14:01:38.402508 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm list --format json 2025-03-07 14:01:39.578802 D | cephosd: {} 2025-03-07 14:01:39.578963 I | cephosd: 0 ceph-volume lvm osd devices configured on this node 2025-03-07 14:01:39.579037 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log raw list --format json 2025-03-07 14:01:53.916338 D | cephosd: { "5858fb3b-278d-4332-be6d-1bcdada54327": { "ceph_fsid": "8206fd72-4080-4bf2-9ada-fa209686e101", "device": "/dev/sdh", "osd_id": 0, "osd_uuid": "5858fb3b-278d-4332-be6d-1bcdada54327", "type": "bluestore-rdr" } } 2025-03-07 14:01:53.916771 I | cephosd: 1 ceph-volume raw osd devices configured on this node 2025-03-07 14:01:53.916820 I | cephosd: destroying osd.0 2025-03-07 14:01:53.916873 D | exec: Running command: ceph osd destroy osd.0 --yes-i-really-mean-it --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json 2025-03-07 14:01:54.678150 I | cephosd: successfully destroyed osd.0 2025-03-07 14:01:54.678295 I | cephosd: zap OSD.0 path "/dev/sdh" 2025-03-07 14:01:54.678336 D | exec: Running command: stdbuf -oL ceph-volume lvm zap /dev/sdh --destroy 2025-03-07 14:01:56.700741 C | rookcmd: failed to destroy OSD 0.: fa
Here the device should be "/dev/mapper/mpathb" as the disk is multipath enabled, therefore the zapping of the device fails inaddition to which the OSD migration also fails.
[client.admin] keyring = /var/lib/rook/openshift-storage/client.admin.keyring 2025-03-07 14:01:38.402172 I | cephcmd: destroying osd.0 and cleaning its backing device 2025-03-07 14:01:38.402508 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm list --format json 2025-03-07 14:01:39.578802 D | cephosd: {} 2025-03-07 14:01:39.578963 I | cephosd: 0 ceph-volume lvm osd devices configured on this node 2025-03-07 14:01:39.579037 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log raw list --format json 2025-03-07 14:01:53.916338 D | cephosd: { "5858fb3b-278d-4332-be6d-1bcdada54327": { "ceph_fsid": "8206fd72-4080-4bf2-9ada-fa209686e101", "device": "/dev/sdh", "osd_id": 0, "osd_uuid": "5858fb3b-278d-4332-be6d-1bcdada54327", "type": "bluestore-rdr" } } 2025-03-07 14:01:53.916771 I | cephosd: 1 ceph-volume raw osd devices configured on this node 2025-03-07 14:01:53.916820 I | cephosd: destroying osd.0 2025-03-07 14:01:53.916873 D | exec: Running command: ceph osd destroy osd.0 --yes-i-really-mean-it --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json 2025-03-07 14:01:54.678150 I | cephosd: successfully destroyed osd.0 2025-03-07 14:01:54.678295 I | cephosd: zap OSD.0 path "/dev/sdh" 2025-03-07 14:01:54.678336 D | exec: Running command: stdbuf -oL ceph-volume lvm zap /dev/sdh --destroy 2025-03-07 14:01:56.700741 C | rookcmd: failed to destroy OSD 0.: fa
Here the device should be "/dev/mapper/mpathb" as the disk is multipath enabled, therefore the zapping of the device fails inaddition to which the OSD migration also fails.
The OCP platform infrastructure and deployment type (AWS, Bare Metal, VMware, etc. Please clarify if it is platform agnostic deployment), (IPI/UPI):
Baremetal (IBM Z)
The ODF deployment type (Internal, External, Internal-Attached (LSO), Multicluster, DR, Provider, etc):
Internal (LSO), RDR environment
The version of all relevant components (OCP, ODF, RHCS, ACM whichever is applicable):
OCP: 4.18.3
ODF: 4.18.0-rhodf
ACM: 2.12.2
Is there any workaround available to the best of your knowledge?
No
Can this issue be reproduced? If so, please provide the hit rate
Yes, always
Can this issue be reproduced from the UI?
Yes
If this is a regression, please provide more details to justify this:
Steps to Reproduce:
1. Deploy ODF 4.17.5-rhodf with multipath devices on IBM Z environment/x86
2. Upgrade to 4.18.0-rhodf
3. Observe the migration status of OSDs and you can see that one of the OSD which has been attempted for migration is in Crashloopbackoff state
The exact date and time when the issue was observed, including timezone details:
Actual results:
OSD migration fails and is in Crashloopbackoff state
Expected results:
OSD migration should be successful and the bluestore should change from bluestore-rdr to bluestore
Logs collected and log location:
Additional info:
**
# oc logs rook-ceph-osd-prepare-243d93cc209f61f44545ac4620752c4b-gsmxb Defaulted container "provision" out of: provision, copy-bins (init), blkdevmapper (init) mon_data_avail_warn = 15 mon_warn_on_pool_no_redundancy = false bluestore_prefer_deferred_size_hdd = 0 [osd] osd_memory_target_cgroup_limit_ratio = 0.8 [client.admin] keyring = /var/lib/rook/openshift-storage/client.admin.keyring 2025-03-07 14:01:38.402172 I | cephcmd: destroying osd.0 and cleaning its backing device 2025-03-07 14:01:38.402508 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log lvm list --format json 2025-03-07 14:01:39.578802 D | cephosd: {} 2025-03-07 14:01:39.578963 I | cephosd: 0 ceph-volume lvm osd devices configured on this node 2025-03-07 14:01:39.579037 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log raw list --format json 2025-03-07 14:01:53.916338 D | cephosd: { "5858fb3b-278d-4332-be6d-1bcdada54327": { "ceph_fsid": "8206fd72-4080-4bf2-9ada-fa209686e101", "device": "/dev/sdh", "osd_id": 0, "osd_uuid": "5858fb3b-278d-4332-be6d-1bcdada54327", "type": "bluestore-rdr" } } 2025-03-07 14:01:53.916771 I | cephosd: 1 ceph-volume raw osd devices configured on this node 2025-03-07 14:01:53.916820 I | cephosd: destroying osd.0 2025-03-07 14:01:53.916873 D | exec: Running command: ceph osd destroy osd.0 --yes-i-really-mean-it --connect-timeout=15 --cluster=openshift-storage --conf=/var/lib/rook/openshift-storage/openshift-storage.config --name=client.admin --keyring=/var/lib/rook/openshift-storage/client.admin.keyring --format json 2025-03-07 14:01:54.678150 I | cephosd: successfully destroyed osd.0 2025-03-07 14:01:54.678295 I | cephosd: zap OSD.0 path "/dev/sdh" 2025-03-07 14:01:54.678336 D | exec: Running command: stdbuf -oL ceph-volume lvm zap /dev/sdh --destroy 2025-03-07 14:01:56.700741 C | rookcmd: failed to destroy OSD 0.: fa Traceback (most recent call last): File "/usr/sbin/ceph-volume", line 33, in <module> sys.exit(load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')()) File "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 54, in __init__ self.main(self.argv) File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 59, in newfunc return f(*a, **kw) File "/usr/lib/python3.9/site-packages/ceph_volume/main.py", line 166, in main terminal.dispatch(self.mapper, subcommand_args) File "/usr/lib/python3.9/site-packages/ceph_volume/terminal.py", line 194, in dispatch instance.main() File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/main.py", line 46, in main terminal.dispatch(self.mapper, self.argv) File "/usr/lib/python3.9/site-packages/ceph_volume/terminal.py", line 194, in dispatch instance.main() File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 431, in main self.zap() File "/usr/lib/python3.9/site-packages/ceph_volume/decorators.py", line 16, in is_root return func(*a, **kw) File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 307, in zap self.zap_raw_device(device) File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 289, in zap_raw_device zap_device(device.path) File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 25, in zap_device zap_bluestore(path) File "/usr/lib/python3.9/site-packages/ceph_volume/devices/lvm/zap.py", line 36, in zap_bluestore process.run([ File "/usr/lib/python3.9/site-packages/ceph_volume/process.py", line 147, in run raise RuntimeError(msg) RuntimeError: command returned non-zero exit status: 1.: exit status 1
sh-5.1# ceph-volume raw list --format json { "230473aa-4d5a-406e-89ca-02e47bf7a98f": { "ceph_fsid": "6fdb7424-efc1-4916-a9d5-5597b9be8d87", "device": "/dev/loop1", "osd_id": 1, "osd_uuid": "230473aa-4d5a-406e-89ca-02e47bf7a98f", "type": "bluestore-rdr" }, "643a3711-8d53-457d-8c25-f633d35026a5": { "ceph_fsid": "4db013b4-272a-4fc6-beeb-d0fdac6cbbc5", "device": "/dev/sdh", "osd_id": 2, "osd_uuid": "643a3711-8d53-457d-8c25-f633d35026a5", "type": "bluestore-rdr" } }
2025-03-10 10:48:46.421588 D | exec: Running command: stdbuf -oL ceph-volume --log-path /tmp/ceph-log raw list /mnt/ocs-deviceset-localblock-0-data-16rtbs --format json 2025-03-10 10:48:46.827620 D | cephosd: { "230473aa-4d5a-406e-89ca-02e47bf7a98f": { "ceph_fsid": "6fdb7424-efc1-4916-a9d5-5597b9be8d87", "device": "/dev/mapper/mpathb", "osd_id": 1, "osd_uuid": "230473aa-4d5a-406e-89ca-02e47bf7a98f", "type": "bluestore-rdr" } }
- clones
-
DFBUGS-1801 ODF with RDR and multipath fails during upgrade from 4.17 to 4.18 - OSD migration fails as ceph-volume fails to fetch the multipath device
-
- Closed
-
- links to