Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-44479

[4.14] virt-cdi-import often killed by oom-killer during VM provisioning

XMLWordPrintable

    • Storage Core Sprint 257
    • High
    • No

      Description of problem:

      Cluster often has to attempt to provision a VM multiple times before it actually works. It sometimes fails 2-3 times in a row before it works.
      It always works after some retries, but quite often also fails on 1st and 2nd attempts.

      I've seen this rarely before, now after upgrading the cluster to 4.13-rc7 (CNV still on 4.12.2) it seems to hit more frequently.

      I noticed its writing with "-t writeback", and page cache is most of the cgroup memory limit consumed (537M out of 585M).

      This does not seem to be a good option for converting images under a cgroup limit.

      107 127318 127223 54 01:15 ? 00:00:10 qemu-img convert -S 0 -t writeback -p -O raw nbd+unix:///?socket=/tmp/nbdkit.sock /dev/cdi-block-volume

      May 08 01:05:17 yellow.toca.local kernel: qemu-img invoked oom-killer: gfp_mask=0x101c00(GFP_NOIO|_GFP_HARDWALL|_GFP_WRITE), order=0, oom_score_adj=999
      May 08 01:05:17 yellow.toca.local kernel: CPU: 2 PID: 115058 Comm: qemu-img Kdump: loaded Not tainted 5.14.0-284.13.1.el9_2.x86_64 #1

      May 08 01:05:17 yellow.toca.local kernel: Call Trace:
      May 08 01:05:17 yellow.toca.local kernel: <TASK>
      May 08 01:05:17 yellow.toca.local kernel: dump_stack_lvl+0x34/0x48
      May 08 01:05:17 yellow.toca.local kernel: dump_header+0x4a/0x201
      May 08 01:05:17 yellow.toca.local kernel: oom_kill_process.cold+0xb/0x10
      May 08 01:05:17 yellow.toca.local kernel: out_of_memory+0xed/0x2e0
      May 08 01:05:17 yellow.toca.local kernel: ? __wake_up_common+0x7d/0x190
      May 08 01:05:17 yellow.toca.local kernel: mem_cgroup_out_of_memory+0x13a/0x150
      May 08 01:05:17 yellow.toca.local kernel: try_charge_memcg+0x6df/0x7a0
      May 08 01:05:17 yellow.toca.local kernel: charge_memcg+0x9f/0x130
      May 08 01:05:17 yellow.toca.local kernel: __mem_cgroup_charge+0x29/0x80
      May 08 01:05:17 yellow.toca.local kernel: __filemap_add_folio+0x224/0x5a0
      May 08 01:05:17 yellow.toca.local kernel: ? scan_shadow_nodes+0x30/0x30
      May 08 01:05:17 yellow.toca.local kernel: filemap_add_folio+0x37/0xa0
      May 08 01:05:17 yellow.toca.local kernel: __filemap_get_folio+0x1e6/0x330
      May 08 01:05:17 yellow.toca.local kernel: ? blkdev_write_begin+0x20/0x20
      May 08 01:05:17 yellow.toca.local kernel: pagecache_get_page+0x15/0x90
      May 08 01:05:17 yellow.toca.local kernel: block_write_begin+0x24/0xf0
      May 08 01:05:17 yellow.toca.local kernel: generic_perform_write+0xbe/0x200
      May 08 01:05:17 yellow.toca.local kernel: __generic_file_write_iter+0xe5/0x1a0
      May 08 01:05:17 yellow.toca.local kernel: blkdev_write_iter+0xe4/0x160
      May 08 01:05:17 yellow.toca.local kernel: new_sync_write+0xfc/0x190
      May 08 01:05:17 yellow.toca.local kernel: vfs_write+0x1ef/0x280
      May 08 01:05:17 yellow.toca.local kernel: __x64_sys_pwrite64+0x90/0xc0
      May 08 01:05:17 yellow.toca.local kernel: do_syscall_64+0x59/0x90
      May 08 01:05:17 yellow.toca.local kernel: ? do_futex+0x15b/0x200
      May 08 01:05:17 yellow.toca.local kernel: ? __x64_sys_futex+0x73/0x1d0
      May 08 01:05:17 yellow.toca.local kernel: ? switch_fpu_return+0x49/0xd0
      May 08 01:05:17 yellow.toca.local kernel: ? exit_to_user_mode_prepare+0xec/0x100
      May 08 01:05:17 yellow.toca.local kernel: ? syscall_exit_to_user_mode+0x12/0x30
      May 08 01:05:17 yellow.toca.local kernel: ? do_syscall_64+0x69/0x90
      May 08 01:05:17 yellow.toca.local kernel: ? do_syscall_64+0x69/0x90
      May 08 01:05:17 yellow.toca.local kernel: ? sysvec_apic_timer_interrupt+0x3c/0x90
      May 08 01:05:17 yellow.toca.local kernel: entry_SYSCALL_64_after_hwframe+0x63/0xcd

      May 08 01:05:17 yellow.toca.local kernel: memory: usage 585936kB, limit 585936kB, failcnt 131
      May 08 01:05:17 yellow.toca.local kernel: memory+swap: usage 585936kB, limit 9007199254740988kB, failcnt 0
      May 08 01:05:17 yellow.toca.local kernel: kmem: usage 17716kB, limit 9007199254740988kB, failcnt 0
      May 08 01:05:17 yellow.toca.local kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-podb18ee349_565f_4cb6_9f2b_f4e67a7a7927.slice:
      May 08 01:05:17 yellow.toca.local kernel: anon 44654592
      file 537202688
      kernel 18141184
      kernel_stack 262144
      pagetables 1024000
      percpu 20856
      sock 0
      vmalloc 28672
      shmem 53248
      zswap 0
      zswapped 0
      file_mapped 0
      file_dirty 0
      file_writeback 537149440
      swapcached 0
      anon_thp 14680064
      file_thp 0
      shmem_thp 0
      inactive_anon 44646400
      active_anon 61440
      inactive_file 268509184
      active_file 268574720
      unevictable 0
      slab_reclaimable 16203816
      slab_unreclaimable 497488
      slab 16701304
      workingset_refault_anon 0
      workingset_refault_file 0
      workingset_activate_anon 0
      workingset_activate_file 0
      workingset_restore_anon 0
      workingset_restore_file 0
      workingset_nodereclaim 0
      pgscan 4950026
      pgsteal 4
      pgscan_kswapd 0
      pgscan_direct 4950026
      pgsteal_kswapd 0
      pgsteal_direct 4
      pgfault 13780
      pgmajfault 0
      pgrefill 4884435
      pgactivate 4950019
      pgdeactivate 4884435
      pglazyfree 0
      pglazyfreed 0
      zswpin 0
      zswpout 0
      thp_fault_alloc 13
      thp_collapse_alloc 0
      May 08 01:05:17 yellow.toca.local kernel: Tasks state (memory values in pages):
      May 08 01:05:17 yellow.toca.local kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
      May 08 01:05:17 yellow.toca.local kernel: [ 114988] 0 114988 2076 485 49152 0 -1000 conmon
      May 08 01:05:17 yellow.toca.local kernel: [ 115000] 107 115000 335956 13160 401408 0 999 virt-cdi-import
      May 08 01:05:17 yellow.toca.local kernel: [ 115032] 107 115032 95298 2604 262144 0 999 nbdkit
      May 08 01:05:17 yellow.toca.local kernel: [ 115053] 107 115053 110130 6764 360448 0 999 qemu-img
      May 08 01:05:17 yellow.toca.local kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-3ec0af77f3bab257e52e5a33cfde0e88861bae125913f97b356105b5efd7cc18.scope,mems_allowed=0,oom_memcg=/kubepods.slice/kubepods-burstable.slice>
      May 08 01:05:17 yellow.toca.local kernel: Memory cgroup out of memory: Killed process 115000 (virt-cdi-import) total-vm:1343824kB, anon-rss:19060kB, file-rss:33580kB, shmem-rss:0kB, UID:107 pgtables:392kB oom_score_adj:999

      The image is sourced from an httpd server, it a qcow2. And written to a topolvm volume.

      spec:
      dataVolumeTemplates:

      • metadata:
        creationTimestamp: null
        name: rhvh-1-rootdisk
        namespace: homelab
        spec:
        preallocation: false
        source:
        http:
        url: http://pi.toca.local:8080/images/rhel-8.6.qcow2
        storage:
        resources:
        requests:
        storage: 100Gi
        storageClassName: lvms-ssd-lvmcluster

      Version-Release number of selected component (if applicable):
      4.12.2

      How reproducible:
      Always

      Steps to Reproduce:
      1. Provision a VM
      2. Watch the logs on the node and it may fail 1-2 times before a retry succeeds

              akalenyu Alex Kalenyuk
              rhn-support-gveitmic Germano Veit Michel
              Dalia Frank Dalia Frank
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: