-
Bug
-
Resolution: Unresolved
-
Major
-
CNV v4.16.0
-
None
-
5
-
False
-
-
False
-
CNV v4.99.0.rhel9-1121, CNV v4.16.3.rhel9-133
-
---
-
---
-
-
Storage Core Sprint 256, Storage Core Sprint 257, Storage Core Sprint 258
-
No
Description of problem:
Concurrent host-assisted clone from one source PVC causing the source pod restarts (both same-namespace and cross-namespace). In my case, it was HPP to HPP clone on different environments - BM, PSI.
Version-Release number of selected component (if applicable):
4.16, 4.15, 4.14, haven't tried earlier versions
How reproducible:
Always for concurrent clone
Steps to Reproduce:
1. Cretae several (2-5 will work) DVs that clone one PVC [cloud-user@ocp-psi-executor-xl scripts]$ cat multiple-dv-clones.sh #!/bin/bash set -ex for i in {1..5} do echo "Clone $i times" cat > dv-target-num.yaml << 'EOL' apiVersion: cdi.kubevirt.io/v1beta1 kind: DataVolume metadata: name: mydv annotations: cdi.kubevirt.io/storage.bind.immediate.requested: 'true' spec: sourceRef: kind: DataSource name: rhel8 namespace: openshift-virtualization-os-images storage: resources: requests: storage: 30Gi storageClassName: hostpath-csi-basic EOL sed -i "s/mydv/dv$i/g" dv-target-num.yaml oc create -f dv-target-num.yaml done oc get dv
Actual results:
Target DV has restarts: $ oc get dv NAME PHASE PROGRESS RESTARTS AGE dv1 Succeeded 100.0% 1 12m dv2 Succeeded 100.0% 3 12m dv3 Succeeded 100.0% 4 12m dv4 Succeeded 100.0% 2 12m dv5 Succeeded 100.0% 5 12m
The source pod log says " /usr/bin/tar: disk.img: file changed as we read it "
$ oc logs -n openshift-virtualization-os-images d70713d3-cd74-43ba-a1dd-24cd43feb9d8-source-pod -p VOLUME_MODE=filesystem MOUNT_POINT=/var/run/cdi/clone/source /var/run/cdi/clone/source / UPLOAD_BYTES=32212254742 I0617 12:14:33.426324 10 clone-source.go:220] content-type is "filesystem-clone" I0617 12:14:33.426423 10 clone-source.go:221] mount is "/var/run/cdi/clone/source" I0617 12:14:33.426427 10 clone-source.go:222] upload-bytes is 32212254742 I0617 12:14:33.426444 10 clone-source.go:239] Starting cloner target I0617 12:14:33.426493 10 clone-source.go:177] Executing [/usr/bin/tar cv -S disk.img] I0617 12:14:34.251916 10 clone-source.go:251] Set header to filesystem-clone I0617 12:14:34.435639 10 prometheus.go:75] 0.01 I0617 12:14:35.439591 10 prometheus.go:75] 0.27 I0617 12:14:36.439882 10 prometheus.go:75] 0.48 I0617 12:14:37.441110 10 prometheus.go:75] 0.69 I0617 12:14:38.441187 10 prometheus.go:75] 0.94 I0617 12:14:39.442607 10 prometheus.go:75] 1.18 I0617 12:14:40.442678 10 prometheus.go:75] 1.45 I0617 12:14:41.443432 10 prometheus.go:75] 1.68 I0617 12:14:42.443516 10 prometheus.go:75] 1.93 I0617 12:14:43.445281 10 prometheus.go:75] 2.10 I0617 12:14:44.445685 10 prometheus.go:75] 2.29 I0617 12:14:45.447510 10 prometheus.go:75] 2.49 I0617 12:14:46.447596 10 prometheus.go:75] 2.73 I0617 12:14:47.447764 10 prometheus.go:75] 2.97 I0617 12:14:48.447835 10 prometheus.go:75] 3.24 I0617 12:14:49.450823 10 prometheus.go:75] 3.50 I0617 12:14:50.451068 10 prometheus.go:75] 3.76 I0617 12:14:51.451145 10 prometheus.go:75] 3.94 I0617 12:14:52.454302 10 prometheus.go:75] 4.12 I0617 12:14:53.454610 10 prometheus.go:75] 4.38 I0617 12:14:54.458346 10 prometheus.go:75] 4.60 I0617 12:14:55.459023 10 prometheus.go:75] 4.87 I0617 12:14:56.459297 10 prometheus.go:75] 5.09 I0617 12:14:57.459990 10 prometheus.go:75] 5.35 I0617 12:14:58.460545 10 prometheus.go:75] 5.56 I0617 12:14:59.460626 10 prometheus.go:75] 5.77 I0617 12:15:00.476080 10 prometheus.go:75] 5.89 I0617 12:15:01.476492 10 prometheus.go:75] 6.09 I0617 12:15:02.476930 10 prometheus.go:75] 6.30 I0617 12:15:03.484323 10 prometheus.go:75] 6.54 I0617 12:15:04.484413 10 prometheus.go:75] 6.76 I0617 12:15:05.488516 10 prometheus.go:75] 6.95 F0617 12:15:05.777666 10 clone-source.go:41] Subprocess did not execute successfully, result is: '\x01' disk.img /usr/bin/tar: disk.img: file changed as we read it
Expected results:
No restarts for concurrent clone
Additional info:
- depends on
-
RHEL-50158 RHEL9 stream tar 1.34 is failing on concurrent file read ctime update; should be updated to 1.35 where it's fixed
- Release Pending
-
OCPBUGS-37412 RHEL9 stream tar 1.34 is failing on concurrent file read ctime update; should be updated to 1.35 where it's fixed
- Closed
- is cloned by
-
CNV-48778 [4.17] Concurrent host-assisted clone may have many restarts, log says "file changed as we read it "
- ON_QA
- links to
-
RHEA-2024:139317 OpenShift Virtualization 4.16.4 Images
- mentioned on