-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
2.10.z
Description of problem:
The vSphere XCOPY volume populator (RDM + COLD + SSH method) fails during the target LUN discovery phase. While the LUN is successfully created and mapped on the NetApp ONTAP storage system, the ESXi host cannot discover the device even after multiple storage rescans.
Version-Release number of selected component (if applicable):
2.10.X
How reproducible:
Intermittent
Steps to Reproduce:
from automation:
1. fork mtv-api-tests repo, run rdm test (currently reproduced there):
$ uv run pytest -m copyoffload -k test_copyoffload_rdm_virtual_disk_migration -s --tc=cluster_host:https://api.domain.lab.eng.tlv2.redhat.com:6443 --tc=cluster_username:kubeadmin --tc=cluster_password:'password' --tc=source_provider_type:vsphere --tc=source_provider_version:8.0.3.00400 --tc=storage_class:rhosqe-ontap-san-block --tc=target_ocp_version:4.18
manually:
- Configure copy-offload with NetApp ONTAP backend
- Set Provider to use SSH method (esxiCloneMethod: "ssh")
- Create migration plan with copy-offload enabled StorageMap
- Start migration of VM with disk on eco-iscsi-ds3 datastore
- Observe populator logs during LUN discovery phase
Actual results:
when migration start, it fails on 20% on DiskAllocation step
Expected results:
migration completed successfully
Additional info:
when looking inside the populator logs, we receive the below errors:
I0106 11:37:02.724647 1 host_lease.go:82] This populator's identity is: populate-833c59ef-a7ef-4557-800d-d06b394f66aa I0106 11:37:02.737588 1 host_lease.go:153] Lease esxi-lock-hostsystem-host-12642-slot-0 (slot 0) is expired, attempting to take it over I0106 11:37:02.741603 1 host_lease.go:162] Acquired expired lease slot 0 for host hostsystem-host-12642 I0106 11:37:02.741622 1 host_lease.go:207] Successfully acquired lock slot 0 for host hostsystem-host-12642 I0106 11:37:02.747864 1 client.go:68] about to run esxcli command [storage core device list -d naa.600a0980383139544924583130325351] E0106 11:37:02.777177 1 client.go:71] Failed to run esxcli command [storage core device list -d naa.600a0980383139544924583130325351]: EsxCLI.CLIFault.summary E0106 11:37:02.777353 1 client.go:74] ESX CLI Fault - Type: VimEsxCLICLIFault, Messages: [Unable to find device with name naa.600a0980383139544924583130325351] I0106 11:37:02.783895 1 client.go:68] about to run esxcli command [storage core adapter rescan -t add -a 1] I0106 11:37:07.887700 1 client.go:68] about to run esxcli command [storage core device list -d naa.600a0980383139544924583130325351] E0106 11:37:07.914346 1 client.go:71] Failed to run esxcli command [storage core device list -d naa.600a0980383139544924583130325351]: EsxCLI.CLIFault.summary E0106 11:37:07.914430 1 client.go:74] ESX CLI Fault - Type: VimEsxCLICLIFault, Messages: [Unable to find device with name naa.600a0980383139544924583130325351] I0106 11:37:07.923566 1 client.go:68] about to run esxcli command [storage core adapter rescan -t add -a 1] I0106 11:37:13.034918 1 client.go:68] about to run esxcli command [storage core device list -d naa.600a0980383139544924583130325351] E0106 11:37:13.066047 1 client.go:71] Failed to run esxcli command [storage core device list -d naa.600a0980383139544924583130325351]: EsxCLI.CLIFault.summary E0106 11:37:13.066202 1 client.go:74] ESX CLI Fault - Type: VimEsxCLICLIFault, Messages: [Unable to find device with name naa.600a0980383139544924583130325351] I0106 11:37:13.073141 1 client.go:68] about to run esxcli command [storage core adapter rescan -t add -a 1] I0106 11:37:18.182712 1 client.go:68] about to run esxcli command [storage core device list -d naa.600a0980383139544924583130325351] E0106 11:37:18.210885 1 client.go:71] Failed to run esxcli command [storage core device list -d naa.600a0980383139544924583130325351]: EsxCLI.CLIFault.summary E0106 11:37:18.211010 1 client.go:74] ESX CLI Fault - Type: VimEsxCLICLIFault, Messages: [Unable to find device with name naa.600a0980383139544924583130325351] I0106 11:37:18.219118 1 client.go:68] about to run esxcli command [storage core adapter rescan -t add -a 1] I0106 11:37:23.322309 1 client.go:68] about to run esxcli command [storage core device list -d naa.600a0980383139544924583130325351] E0106 11:37:23.352844 1 client.go:71] Failed to run esxcli command [storage core device list -d naa.600a0980383139544924583130325351]: EsxCLI.CLIFault.summary E0106 11:37:23.352934 1 client.go:74] ESX CLI Fault - Type: VimEsxCLICLIFault, Messages: [Unable to find device with name naa.600a0980383139544924583130325351] I0106 11:37:23.360753 1 client.go:68] about to run esxcli command [storage core adapter rescan -t add -a 1] I0106 11:37:28.466526 1 client.go:68] about to run esxcli command [storage core device list -d naa.600a0980383139544924583130325351] E0106 11:37:28.494550 1 client.go:71] Failed to run esxcli command [storage core device list -d naa.600a0980383139544924583130325351]: EsxCLI.CLIFault.summary E0106 11:37:28.494624 1 client.go:74] ESX CLI Fault - Type: VimEsxCLICLIFault, Messages: [Unable to find device with name naa.600a0980383139544924583130325351] I0106 11:37:28.494646 1 host_lease.go:277] Work complete for slot 0 host hostsystem-host-12642 I0106 11:37:28.668784 1 vsphere-xcopy-volume-populator.go:219] channel quit failed to find the device /vmfs/devices/disks/naa.600a0980383139544924583130325351 after scanning: failed to find device naa.600a0980383139544924583130325351: EsxCLI.CLIFault.summary F0106 11:37:28.668854 1 vsphere-xcopy-volume-populator.go:221] failed to find the device /vmfs/devices/disks/naa.600a0980383139544924583130325351 after scanning: failed to find device naa.600a0980383139544924583130325351: EsxCLI.CLIFault.summary
following this thread, it seems the device is not found for some reason although it is exists
there is an existing environment with __ issue if needs to take a look