Uploaded image for project: 'Migration Toolkit for Virtualization'
  1. Migration Toolkit for Virtualization
  2. MTV-1029

Controller crashes in a loop, product unusable if vSphere provides disk with null datastore

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • 2.6.1
    • 2.5.6
    • Controller
    • Important

      The controller crashes in a loop after being connected to a vsphere provider

       

      [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x193da35]
      goroutine 725 [running]:
      github.com/konveyor/forklift-controller/pkg/controller/provider/container/vsphere.(*VmAdapter).updateDisks(0xc00227a200, 0xc0006b5360?)
              /remote-source/app/pkg/controller/provider/container/vsphere/model.go:705 +0x355
      github.com/konveyor/forklift-controller/pkg/controller/provider/container/vsphere.(*VmAdapter).Apply(0xc00227a200, {{}, {0xc002aefae0, 0x5}, {{0xc002aefb10, 0xe}, {0xc002aefb30, 0x7}}, {0xc002afc600, 0x1b, ...}, ...})
              /remote-source/app/pkg/controller/provider/container/vsphere/model.go:669 +0x20b5
      github.com/konveyor/forklift-controller/pkg/controller/provider/container/vsphere.Collector.applyEnter({{0xc00086cac8, 0x16}, 0xc0005fe400, 0xc00217c140, {0x312a368, 0xc0021469a0}, {0x3120fb8, 0xc00283a120}, 0xc0002448e0, 0xc0028367f0, ...}, ...)
              /remote-source/app/pkg/controller/provider/container/vsphere/collector.go:851 +0x9e
      github.com/konveyor/forklift-controller/pkg/controller/provider/container/vsphere.(*Collector).apply(0xc0028295c0, {0x0?, 0x0?}, 0x0?, {0xc002eae000?, 0x64, 0xa?})
              /remote-source/app/pkg/controller/provider/container/vsphere/collector.go:734 +0x15b
      github.com/konveyor/forklift-controller/pkg/controller/provider/container/vsphere.(*Collector).getUpdates(0xc0028295c0, {0x311de68, 0xc00280fe50})
              /remote-source/app/pkg/controller/provider/container/vsphere/collector.go:396 +0xab4
      github.com/konveyor/forklift-controller/pkg/controller/provider/container/vsphere.(*Collector).Start.func1()
              /remote-source/app/pkg/controller/provider/container/vsphere/collector.go:301 +0xfd
      created by github.com/konveyor/forklift-controller/pkg/controller/provider/container/vsphere.(*Collector).Start
              /remote-source/app/pkg/controller/provider/container/vsphere/collector.go:316 +0xb9
      

       

      The segfault is here:

       

       676  // Update virtual disk devices.
         677  func (v *VmAdapter) updateDisks(devArray *types.ArrayOfVirtualDevice) {
         678          disks := []model.Disk{}
         679          for _, dev := range devArray.VirtualDevice {
         680                  switch dev.(type) {
         681                  case *types.VirtualDisk:
         682                          disk := dev.(*types.VirtualDisk)
         683                          switch disk.Backing.(type) {
         684                          case *types.VirtualDiskFlatVer1BackingInfo:
         685                                  backing := disk.Backing.(*types.VirtualDiskFlatVer1BackingInfo)
         686                                  md := model.Disk{
         687                                          Key:      disk.Key,
         688                                          File:     backing.FileName,
         689                                          Capacity: disk.CapacityInBytes,
         690                                          Datastore: model.Ref{
         691                                                  Kind: model.DsKind,
         692                                                  ID:   backing.Datastore.Value,
         693                                          },
         694                                  }
         695                                  disks = append(disks, md)
         696                          case *types.VirtualDiskFlatVer2BackingInfo:
         697                                  backing := disk.Backing.(*types.VirtualDiskFlatVer2BackingInfo)
         698                                  md := model.Disk{
         699                                          Key:      disk.Key,
         700                                          File:     backing.FileName,
         701                                          Capacity: disk.CapacityInBytes,
         702                                          Shared:   backing.Sharing != "sharingNone",
         703                                          Datastore: model.Ref{
         704                                                  Kind: model.DsKind,
         705                                                  ID:   backing.Datastore.Value,        <--------
         706                                          },
         707                                  }
         708                                  disks = append(disks, md)
       
      

      Because of this malformed/broken disk returned by vSphere

       

      {
        "level": "info",
        "ts": "2024-04-03 08:13:39.876",
        "logger": "debug",
        "msg": "backing-debug",
        "disk": {
          "Key": 2001,
          "DeviceInfo": {
            "Label": "Hard disk 2",
            "Summary": "0 KB"
          },
          "Backing": {
            "FileName": "[] ...vmdk",
            "Datastore": null,                <---------- here is our SIGSEGV when trying to dereference backing.Datastore.Value
            "BackingObjectId": "",
            "DiskMode": "persistent",
            "Split": false,
            .... 
      

      Upon further investigation, the VM does not have a "Hard disk 2", the volume 5b9e1ae2-851ecad0-faa4-6805ca242b07 doesn't even exist, nor does that vmdk. Note the disk size is also zero.

      This appears to be some bug or a problematic VM on the VMware side, but our controller should not crash this way, as it prevents migrating any VMs to OCP. Possibly, this particular VM won't migrate well, but it should not crash like this and make the product totally unusable.

      Please investigate how to make the controller more resilient, printing an error for such disk/VM and allowing the customer to migrate the other VMs which are ok.

              ahadas@redhat.com Arik Hadas
              rhn-support-gveitmic Germano Veit Michel
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: