-
Bug
-
Resolution: Cannot Reproduce
-
Major
-
None
-
Quality / Stability / Reliability
-
False
-
-
False
-
CLOSED
-
Storage Core Sprint 223, Storage Core Sprint 225, Storage Core Sprint 228, Storage Core Sprint 229, Storage Core Sprint 230, Storage Core Sprint 232, Storage Core Sprint 233, Storage Core Sprint 234
-
Important
-
None
Description of problem:
I recently encountered a performance issue on our CI which is mind-boggling
when comparing VM IO workload writing to a PVC - when the PVC was configured as a filesystem vs Block like so:
—
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
annotations:
cdi.kubevirt.io/storage.preallocation: "true"
name: vdbench-pvc-claim
namespace: benchmark-runner
spec:
storageClassName: ocs-storagecluster-ceph-rbd
accessModes: [ "ReadWriteOnce" ]
volumeMode: Filesystem # or Set to BLOCK
resources:
requests:
storage: 64Gi
—
after some investigation, we found that this is happening because on block devices we automatically set
io=native, while on filesystem we do not, now if we use a filesystem within a DataVolume like so:
—
dataVolumeTemplates:
- apiVersion: cdi.kubevirt.io/v1
kind: DataVolume
metadata:
annotations:
kubevirt.io/provisionOnNode: worker-0
name: workload-disk
spec:
pvc:
accessModes: - ReadWriteOnce
resources:
requests:
storage: 65Gi
storageClassName: ocs-storagecluster-ceph-rbd
volumeMode: Filesystem
source:
blank: {}
—
we will still experience 95% degradation compared to block, but if we add "preallocation: true" like so:
—
dataVolumeTemplates:
- apiVersion: cdi.kubevirt.io/v1
kind: DataVolume
metadata:
annotations:
kubevirt.io/provisionOnNode: worker-0
name: workload-disk
spec:
preallocation: true
pvc:
accessModes: - ReadWriteOnce
resources:
requests:
storage: 65Gi
storageClassName: ocs-storagecluster-ceph-rbd
volumeMode: Filesystem
source:
blank: {}
—
then turns out that using "preallocation" (which was created as a tool to improve performance on thin devices),
that magically causes QEMU to set io=native to the filesystem (https://github.com/kubevirt/kubevirt/blob/main/pkg/virt-launcher/virtwrap/converter/converter.go#L480)
that is a workaround that is only applicable to data volumes.
as for PVC the scenario, that's a little more complicated the workaround for that issue will be
to manually create a fully preallocated disk.img in the root directory of the PVC, CNV correctly detects that it was preallocated, and attached it to the VM with io=native
however, both the above workarounds are far from being user-friendly, and there are only a few people that actually know that using the filesystem will cause such severe performance issues, and even fewer know how to address it, which is why I suggest the following:
1. for Datavolume - preallocation should be set to true by default.
2. for PVC - we should implement a way to set io=native
- external trackers