-
Bug
-
Resolution: Done-Errata
-
Critical
-
None
-
False
-
-
False
-
CLOSED
-
---
-
---
-
-
Urgent
-
No
Description of problem:
On recent openshift nightlies simple image pulls (fedora) will simply not converge,
unless the memory limit on CDI pods is kicked up to ridiculous values (1600M),
suggesting that memory throttling may be taking place on the importer pod
Version-Release number of selected component (if applicable):
OCP 4.14.0-0.nightly-2023-08-28-154013
CNV v4.14.0.rhel9-1796
How reproducible:
100%
Steps to Reproduce:
1. Create DV
Actual results:
Basically never converge
Expected results:
Success in a timely manner
Additional info:
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
annotations:
cdi.kubevirt.io/storage.bind.immediate.requested: "true"
name: test-dv-node-import-needs-convert
spec:
source:
http:
url: http://.../Fedora-Cloud-Base-35-1.2.x86_64.qcow2
pvc:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 12Gi
Edit HCO.spec with
resourceRequirements:
storageWorkloads:
limits:
cpu: 750m
memory: 1600M
requests:
cpu: 100m
memory: 60M
To observe how the issue is alleviated
Some inspection of the same issue on GCP clusters importing a Windows image
showed high mem usage values (though not as high as the limit) - attached to the bug
Some notes:
- Is it possible the entire image stays on the page cache?
- Note this is before qemu-img convert
- Why did OOMs/throttles not happen before, say, in 4.14.0-ec.3?
- For some images, 2x CDI pod limits unclog
have to go a lot higher for large images (Windows) to work though - cgroupsv2 is default now (throttles instead of OOM - https://kubernetes.io/blog/2021/11/26/qos-memory-resources/)
- blocks
-
CNV-30277 Comments in "feat: add tekton-tasks-operator" PR #532
- Closed
- external trackers