Uploaded image for project: 'Migration Toolkit for Virtualization'
  1. Migration Toolkit for Virtualization
  2. MTV-4237

[automation tests] flakiness in automation with copyoffload tests

XMLWordPrintable

      Description of problem:

      The vSphere XCOPY volume populator (RDM + COLD + SSH method) fails during the target LUN discovery phase. While the LUN is successfully created and mapped on the NetApp ONTAP storage system, the ESXi host cannot discover the device even after multiple storage rescans.

      Version-Release number of selected component (if applicable):

      2.10.X

      How reproducible:

      Intermittent 

      Steps to Reproduce:

      from automation:

      1. fork mtv-api-tests repo, run rdm test (currently reproduced there):

      $ uv run pytest -m copyoffload -k test_copyoffload_rdm_virtual_disk_migration -s --tc=cluster_host:https://api.domain.lab.eng.tlv2.redhat.com:6443 --tc=cluster_username:kubeadmin --tc=cluster_password:'password' --tc=source_provider_type:vsphere --tc=source_provider_version:8.0.3.00400 --tc=storage_class:rhosqe-ontap-san-block --tc=target_ocp_version:4.18

      manually:

      1. Configure copy-offload with NetApp ONTAP backend
      1. Set Provider to use SSH method (esxiCloneMethod: "ssh")
      1. Create migration plan with copy-offload enabled StorageMap
      1. Start migration of VM with disk on eco-iscsi-ds3 datastore
      1. Observe populator logs during LUN discovery phase

       

      Actual results:

      when migration start, it fails on 20% on DiskAllocation step

      Expected results:

      migration completed successfully 

      Additional info:

      when looking inside the populator logs, we receive the below errors:

      I0106 11:37:02.724647    1 host_lease.go:82] This populator's identity is: populate-833c59ef-a7ef-4557-800d-d06b394f66aa
      I0106 11:37:02.737588    1 host_lease.go:153] Lease esxi-lock-hostsystem-host-12642-slot-0 (slot 0) is expired, attempting to take it over
      I0106 11:37:02.741603    1 host_lease.go:162] Acquired expired lease slot 0 for host hostsystem-host-12642
      I0106 11:37:02.741622    1 host_lease.go:207] Successfully acquired lock slot 0 for host hostsystem-host-12642
      I0106 11:37:02.747864    1 client.go:68] about to run esxcli command [storage core device list -d naa.600a0980383139544924583130325351]
      E0106 11:37:02.777177    1 client.go:71] Failed to run esxcli command [storage core device list -d naa.600a0980383139544924583130325351]: EsxCLI.CLIFault.summary
      E0106 11:37:02.777353    1 client.go:74] ESX CLI Fault - Type: VimEsxCLICLIFault, Messages: [Unable to find device with name naa.600a0980383139544924583130325351]
      I0106 11:37:02.783895    1 client.go:68] about to run esxcli command [storage core adapter rescan -t add -a 1]
      I0106 11:37:07.887700    1 client.go:68] about to run esxcli command [storage core device list -d naa.600a0980383139544924583130325351]
      E0106 11:37:07.914346    1 client.go:71] Failed to run esxcli command [storage core device list -d naa.600a0980383139544924583130325351]: EsxCLI.CLIFault.summary
      E0106 11:37:07.914430    1 client.go:74] ESX CLI Fault - Type: VimEsxCLICLIFault, Messages: [Unable to find device with name naa.600a0980383139544924583130325351]
      I0106 11:37:07.923566    1 client.go:68] about to run esxcli command [storage core adapter rescan -t add -a 1]
      I0106 11:37:13.034918    1 client.go:68] about to run esxcli command [storage core device list -d naa.600a0980383139544924583130325351]
      E0106 11:37:13.066047    1 client.go:71] Failed to run esxcli command [storage core device list -d naa.600a0980383139544924583130325351]: EsxCLI.CLIFault.summary
      E0106 11:37:13.066202    1 client.go:74] ESX CLI Fault - Type: VimEsxCLICLIFault, Messages: [Unable to find device with name naa.600a0980383139544924583130325351]
      I0106 11:37:13.073141    1 client.go:68] about to run esxcli command [storage core adapter rescan -t add -a 1]
      I0106 11:37:18.182712    1 client.go:68] about to run esxcli command [storage core device list -d naa.600a0980383139544924583130325351]
      E0106 11:37:18.210885    1 client.go:71] Failed to run esxcli command [storage core device list -d naa.600a0980383139544924583130325351]: EsxCLI.CLIFault.summary
      E0106 11:37:18.211010    1 client.go:74] ESX CLI Fault - Type: VimEsxCLICLIFault, Messages: [Unable to find device with name naa.600a0980383139544924583130325351]
      I0106 11:37:18.219118    1 client.go:68] about to run esxcli command [storage core adapter rescan -t add -a 1]
      I0106 11:37:23.322309    1 client.go:68] about to run esxcli command [storage core device list -d naa.600a0980383139544924583130325351]
      E0106 11:37:23.352844    1 client.go:71] Failed to run esxcli command [storage core device list -d naa.600a0980383139544924583130325351]: EsxCLI.CLIFault.summary
      E0106 11:37:23.352934    1 client.go:74] ESX CLI Fault - Type: VimEsxCLICLIFault, Messages: [Unable to find device with name naa.600a0980383139544924583130325351]
      I0106 11:37:23.360753    1 client.go:68] about to run esxcli command [storage core adapter rescan -t add -a 1]
      I0106 11:37:28.466526    1 client.go:68] about to run esxcli command [storage core device list -d naa.600a0980383139544924583130325351]
      E0106 11:37:28.494550    1 client.go:71] Failed to run esxcli command [storage core device list -d naa.600a0980383139544924583130325351]: EsxCLI.CLIFault.summary
      E0106 11:37:28.494624    1 client.go:74] ESX CLI Fault - Type: VimEsxCLICLIFault, Messages: [Unable to find device with name naa.600a0980383139544924583130325351]
      I0106 11:37:28.494646    1 host_lease.go:277] Work complete for slot 0 host hostsystem-host-12642
      I0106 11:37:28.668784    1 vsphere-xcopy-volume-populator.go:219] channel quit failed to find the device /vmfs/devices/disks/naa.600a0980383139544924583130325351 after scanning: failed to find device naa.600a0980383139544924583130325351: EsxCLI.CLIFault.summary
      F0106 11:37:28.668854    1 vsphere-xcopy-volume-populator.go:221] failed to find the device /vmfs/devices/disks/naa.600a0980383139544924583130325351 after scanning: failed to find device naa.600a0980383139544924583130325351: EsxCLI.CLIFault.summary

      following this thread, it seems the device is not found for some reason although it is exists

      there is an existing environment with __ issue if needs to take a look

              rgolan1@redhat.com Roy Golan
              smiron@redhat.com Shelly Miron
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: