Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-32676

[2236223] Importer very slow to pull images, possibly mem throttled

    XMLWordPrintable

Details

    • Urgent

    Description

      Description of problem:
      On recent openshift nightlies simple image pulls (fedora) will simply not converge,
      unless the memory limit on CDI pods is kicked up to ridiculous values (1600M),
      suggesting that memory throttling may be taking place on the importer pod

      Version-Release number of selected component (if applicable):
      OCP 4.14.0-0.nightly-2023-08-28-154013
      CNV v4.14.0.rhel9-1796

      How reproducible:
      100%

      Steps to Reproduce:
      1. Create DV

      Actual results:
      Basically never converge

      Expected results:
      Success in a timely manner

      Additional info:
      apiVersion: cdi.kubevirt.io/v1beta1
      kind: DataVolume
      metadata:
      annotations:
      cdi.kubevirt.io/storage.bind.immediate.requested: "true"
      name: test-dv-node-import-needs-convert
      spec:
      source:
      http:
      url: http://.../Fedora-Cloud-Base-35-1.2.x86_64.qcow2
      pvc:
      accessModes:

      • ReadWriteOnce
        resources:
        requests:
        storage: 12Gi

      Edit HCO.spec with
      resourceRequirements:
      storageWorkloads:
      limits:
      cpu: 750m
      memory: 1600M
      requests:
      cpu: 100m
      memory: 60M
      To observe how the issue is alleviated

      Some inspection of the same issue on GCP clusters importing a Windows image
      showed high mem usage values (though not as high as the limit) - attached to the bug

      Some notes:

      • Is it possible the entire image stays on the page cache?
      • Note this is before qemu-img convert
      • Why did OOMs/throttles not happen before, say, in 4.14.0-ec.3?
      • For some images, 2x CDI pod limits unclog
        have to go a lot higher for large images (Windows) to work though
      • cgroupsv2 is default now (throttles instead of OOM - https://kubernetes.io/blog/2021/11/26/qos-memory-resources/)

      Attachments

        Issue Links

          Activity

            People

              akalenyu Alex Kalenyuk
              akalenyu Alex Kalenyuk
              Natalie Gavrielov Natalie Gavrielov (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: