Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-81643

[GCNV] CDI DataVolume import from VMExport is too slow and times out (progress stays N/A)

XMLWordPrintable

    • Product / Portfolio Work
    • 0.42
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None

      Description of problem:

      When creating a VM from VMExport snapshot manifests, the DataVolume import is extremely slow and does not complete within 30 minutes. The DV stays in ImportInProgress phase with progress: N/A throughout — progress is never reported. The target PVC remains in Pending state until the import eventually times out.The import transfers a 36 GiB sparse image (actual data ~2 GB) at approximately 0.5–1 GB/min. After 28 minutes only 25 GB of 36.3 GB virtual size was transferred, and the 30-minute timeout expired before completion.
      
      
      Elapsed TimeScratch File (Sparse Size)Actual Disk Usage2 min689 MB480 MB5 min3.1 GB844 MB11 min11 GB1.7 GB18 min15 GB1.7 GB28 min25 GB—~31 minTimeout—
      
      
      

      Observed Transfer Timeline

      Elapsed Time Scratch File (Sparse Size) Actual Disk Usage
      2 min 689 MB 480 MB
      5 min 3.1 GB 844 MB
      11 min 11 GB 1.7 GB
      18 min 15 GB 1.7 GB
      28 min 25 GB
      ~31 min Timeout

      DataVolume Status During Import

      Condition Status Reason
      Bound False Pending
      Ready False TransferRunning
      Running True Pod is running
      Progress N/A Never updates

       

      Version-Release number of selected component (if applicable):

      OCP: 4.21.5CNV/CDI: 4.21.1Storage Class: gcnv-flex 

      How reproducible:

      80%

      Steps to Reproduce:

      1. Create a RHEL VM with a DataVolume using storage class gcnv-flex (NetApp Trident CSI)
      
      2. Write some content to the VM's filesystem
      3.Create a VirtualMachineSnapshot of the VM
      4.Create a VirtualMachineExport from the snapshot; wait for it to become Ready
      5. Fetch the VM manifest from the VMExport external URL (via curl with export token and CA cert)
      6.Create a new VM in a different namespace using the fetched manifest
      7.Start the VM and wait for the DataVolume to succeed 

      Actual results:

      DataVolume import completes in a reasonable time with progress reported.

      Expected results:

       Import is too slow, progress stays N/A, times out after 30 minutes.

      Additional info:

      The importer pod logs an HTTP 400 error at startup:  E0310 19:40:25.936734  1 http-datasource.go:451] http: expected status code 200, got 400
      logs 
      
      
      I0310 19:40:25.824691       1 importer.go:107] Starting importer
      I0310 19:40:25.825913       1 importer.go:182] begin import process
      I0310 19:40:25.837530       1 http-datasource.go:279] Attempting to get certs from /certs/ca.pem
      E0310 19:40:25.936734       1 http-datasource.go:451] http: expected status code 200, got 400
      I0310 19:40:25.968025       1 data-processor.go:361] Calculating available size
      I0310 19:40:25.969476       1 data-processor.go:373] Checking out file system volume size.
      I0310 19:40:25.969811       1 data-processor.go:380] Request image size not empty.
      I0310 19:40:25.969822       1 data-processor.go:386] Target size 38976828212.
      I0310 19:40:25.970474       1 nbdkit.go:371] Waiting for nbdkit PID.
      I0310 19:40:26.470675       1 nbdkit.go:392] nbdkit ready.
      I0310 19:40:26.470688       1 data-processor.go:260] New phase: TransferScratch
      I0310 19:40:26.476653       1 file.go:230] copyWithSparseCheck to /scratch/tmpimage
      
      
      E0310 19:40:25.936734  1 http-datasource.go:451] http: expected status code 200, got 400Normal    ImportInProgress   datavolume/rhel10-1773171321-3676066   Import into rhel10-1773171321-3676066 in progress
      Normal    Pending            datavolume/rhel10-1773171321-3676066   target PVC rhel10-1773171321-3676066 Pending
      Warning   Unschedulable      datavolume/rhel10-1773171321-3676066   Importer pod cannot be scheduled
      Normal    ExternalProvisioning   persistentvolumeclaim/rhel10-1773171321-3676066   Waiting for a volume to be created either by the external provisioner 'csi.trident.netapp.io'

              ngavrilo@redhat.com Natalie Gavrielov
              rh-ee-ahafe Ahmad Hafi
              Natalie Gavrielov Natalie Gavrielov
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: