Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Blocker
Fix Version/s: OADP 1.3.1
Affects Version/s: OADP 1.3.1
Component/s: kopia, restic
Labels:
- qe-blocker
- regerssion

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

This bug prevents us from running our kopia/restic flows but our datamover flows run fine.
This bug does not reproduce in 1.30 GA latest, or 1.3.1-27 but does in 1.3.1-54, 57, and 59 where velero version changed - note this statement is based on runs done on Apri 10 2024 (today)

Show
This bug prevents us from running our kopia/restic flows but our datamover flows run fine. This bug does not reproduce in 1.30 GA latest, or 1.3.1-27 but does in 1.3.1-54, 57, and 59 where velero version changed - note this statement is based on runs done on Apri 10 2024 (today)
Ready:
False
QEStatus:
ToDo
Steps to Reproduce:

Hide

manually reproduction steps
To reproduce:
1) clone repo to your /tmp folder on machine which has access to openshift cluster git clone git@gitlab.cee.redhat.com:mlehrer/mpqe-scale-scripts.git
2) execute this ansible-playbook

ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-0 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-0 sc=ocs-storagecluster-ceph-rbd' -vvvv

ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-1 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-1 sc=ocs-storagecluster-ceph-rbd' -vvvv

ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-2 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-2 sc=ocs-storagecluster-ceph-rbd' -vvvv

ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-3 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-3 sc=ocs-storagecluster-ceph-rbd' -vvvv

ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-4 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-4 sc=ocs-storagecluster-ceph-rbd' -vvvv

ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-5 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-5 sc=ocs-storagecluster-ceph-rbd' -vvvv

ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-6 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-6 sc=ocs-storagecluster-ceph-rbd' -vvvv

ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-7 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-7 sc=ocs-storagecluster-ceph-rbd' -vvvv

ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-8 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-8 sc=ocs-storagecluster-ceph-rbd' -vvvv

ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-9 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-9 sc=ocs-storagecluster-ceph-rbd' -vvvv

This will deploy 10 pods with 2GBs of utilized data in namespace perf-datagen-case3-rbd

Perform backup of perf-datagen-case3-rbd this will complete successfully.
Perform restore of perf-datagen-case3-rbd and it will show 1 or 2 successful podvolumerstore and then fail and wait until velero timeout expires.

Show
manually reproduction steps To reproduce: 1) clone repo to your /tmp folder on machine which has access to openshift cluster git clone git@gitlab.cee.redhat.com:mlehrer/mpqe-scale-scripts.git 2) execute this ansible-playbook ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-0 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-0 sc=ocs-storagecluster-ceph-rbd' -vvvv ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-1 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-1 sc=ocs-storagecluster-ceph-rbd' -vvvv ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-2 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-2 sc=ocs-storagecluster-ceph-rbd' -vvvv ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-3 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-3 sc=ocs-storagecluster-ceph-rbd' -vvvv ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-4 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-4 sc=ocs-storagecluster-ceph-rbd' -vvvv ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-5 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-5 sc=ocs-storagecluster-ceph-rbd' -vvvv ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-6 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-6 sc=ocs-storagecluster-ceph-rbd' -vvvv ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-7 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-7 sc=ocs-storagecluster-ceph-rbd' -vvvv ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-8 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-8 sc=ocs-storagecluster-ceph-rbd' -vvvv ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-9 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-9 sc=ocs-storagecluster-ceph-rbd' -vvvv This will deploy 10 pods with 2GBs of utilized data in namespace perf-datagen-case3-rbd Perform backup of perf-datagen-case3-rbd this will complete successfully. Perform restore of perf-datagen-case3-rbd and it will show 1 or 2 successful podvolumerstore and then fail and wait until velero timeout expires.
Intelligence Requested:
Market:

WSJF:
0
Risk Probability:
Very Likely
Risk Score:
0

Workstream:

None

Root Cause:
Unset
Failure Category:
Unknown

Regression:
No

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

During Kopia or Restic restore operations, there is an issue with (PVR) resources. the restore CR ended as "PartiallyFailed"
when checking the restore CR , some of them marked as completed , failed , and those which are appears as "new" didn't start at all
Additionally, this issue has been reproduced during restores using Restic & Kopia as well.

PVR :

39 - "Completed
3 - "Failed"
58 - " New - without any progress " - marked as new on the CR describe command - on the PVR resource they without any "Progress:" state

kni@f07-h27-000-r640 benchmark-runner-assistant]$ oc  get  podvolumerestore  -nopenshift-adp |grep restic | grep  Completed  |wc -l
39

[kni@f07-h27-000-r640 benchmark-runner-assistant]$ oc  get  podvolumerestore  -nopenshift-adp |grep restic | grep Failed  |wc -l
3

[kni@f07-h27-000-r640 benchmark-runner-assistant]$ oc  get  podvolumerestore  -nopenshift-adp |grep restic | grep -v "Failed\|Completed"  |wc -l
58

PVB :

[kni@f07-h27-000-r640 benchmark-runner-assistant]$ oc get Podvolumebackup -A |grep data |grep Completed | wc -l 
100

[kni@f07-h27-000-r640 ~]$ velero restore describe restore-restic-datagen-single-ns-100pods-cephrbd
Name:         restore-restic-datagen-single-ns-100pods-cephrbd
Namespace:    openshift-adp
Labels:       <none>
Annotations:  <none>Phase:                       PartiallyFailed (run 'velero restore logs restore-restic-datagen-single-ns-100pods-cephrbd' for more information)
Total items to be restored:  521
Items restored:              521Started:    2024-04-07 14:33:50 +0000 UTC
Completed:  2024-04-09 14:33:50 +0000 UTCWarnings:
  Velero:     <none>
  Cluster:  could not restore, CustomResourceDefinition "clusterserviceversions.operators.coreos.com" already exists. Warning: the in-cluster version is different than the backed-up version
  Namespaces:
    datagen-single-ns-100pods-cephrbd:  could not restore, RoleBinding "system:image-pullers" already exists. Warning: the in-cluster version is different than the backed-up version
                                        could not restore, ConfigMap "kube-root-ca.crt" already exists. Warning: the in-cluster version is different than the backed-up version
                                        could not restore, ConfigMap "openshift-service-ca.crt" already exists. Warning: the in-cluster version is different than the backed-up version
                                        could not restore, ClusterServiceVersion "volsync-product.v0.7.4-0.1698026108.p" already exists. Warning: the in-cluster version is different than the backed-up version
                                        could not restore, RoleBinding "system:deployers" already exists. Warning: the in-cluster version is different than the backed-up version
                                        could not restore, RoleBinding "system:image-builders" already exists. Warning: the in-cluster version is different than the backed-up version
                                        could not restore, RoleBinding "system:image-pullers" already exists. Warning: the in-cluster version is different than the backed-up versionErrors:
  Velero:   pod volume restore failed: data path restore failed: chdir /host_pods/1a3eda7f-cc21-4aa1-ba7d-12b9f4102584/volumes/kubernetes.io~csi/pvc-fc80f14b-0236-44ac-88bc-4c3b4e14aed3/mount: no such file or directory
            pod volume restore failed: data path restore failed: chdir /host_pods/804a47c6-6eca-44c7-be3a-34a7c4ea749c/volumes/kubernetes.io~csi/pvc-4db942e2-4416-4965-be93-bf5ddb990a78/mount: no such file or directory
            pod volume restore failed: data path restore failed: chdir /host_pods/cb26a3f3-1bcd-421d-a5ff-77a0e48ca5c0/volumes/kubernetes.io~csi/pvc-e8fa3eb5-a89d-4e93-9b9a-9430915f7b87/mount: no such file or directory
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
            timed out waiting for all PodVolumeRestores to complete
  Cluster:    <none>
  Namespaces: <none>Backup:  backup-restic-datagen-single-ns-100pods-cephrbdNamespaces:
  Included:  all namespaces found in the backup
  Excluded:  <none>Resources:
  Included:        *
  Excluded:        nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io, csinodes.storage.k8s.io, volumeattachments.storage.k8s.io, backuprepositories.velero.io
  Cluster-scoped:  autoNamespace mappings:  <none>Label selector:  <none>Or label selector:  <none>Restore PVs:  autorestic Restores (specify --details for more information):
  Completed:  39
  Failed:     3
  New:        58Existing Resource Policy:   <none>
ItemOperationTimeout:       4h0m0sPreserve Service NodePorts:  auto

PV & Pods

[kni@f07-h27-000-r640 benchmark-runner-assistant]$ oc get pv -n datagen-single-ns-100pods-cephrbd | grep gen |wc -l 
100
[kni@f07-h27-000-r640 benchmark-runner-assistant]$ oc get pods -ndatagen-single-ns-100pods-cephrbd |grep Running |wc -l 
100

from velero log

I0404 15:56:15.980518       1 request.go:690] Waited for 1.045157493s due to client-side throttling, not priority and fairness, request: GET:https://172.30.0.1:443/apis/flowcontrol.apiserver.k8s.io/v1beta3?timeout=32s


E0405 07:40:22.212924       1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.6/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: the server does not allow this method on the requested resource
E0405 07:41:15.985431       1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.6/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: the server does not allow this method on the requested resource
E0405 07:42:07.039601       1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.6/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: the server does not allow this method on the requested resource
E0405 07:42:49.140141       1 reflector.go:140] pkg/mod/k8s.io/client-go@v0.25.6/tools/cache/reflector.go:169: Failed to watch *unstructured.Unstructured: the server does not allow this method on the requested resource

node-agent

 time="2024-04-07T14:36:11Z" level=info msg="Restore starting" PodVolumeRestore=openshift-adp/restore-restic-datagen-single-ns-100pods-cephrbd-ldw7j control │
│ time="2024-04-07T14:36:11Z" level=info msg="Got volume dir" PodVolumeRestore=openshift-adp/restore-restic-datagen-single-ns-100pods-cephrbd-ldw7j controlle │
│ time="2024-04-07T14:36:11Z" level=info msg="Found path matching glob" PodVolumeRestore=openshift-adp/restore-restic-datagen-single-ns-100pods-cephrbd-ldw7j │
│ time="2024-04-07T14:36:11Z" level=info msg="Founding existing repo" backupLocation=bucket logSource="/remote-source/velero/app/pkg/repository/ensurer.go:86 │
│ time="2024-04-07T14:36:12Z" level=info msg="FileSystemBR is initialized" PodVolumeRestore=openshift-adp/restore-restic-datagen-single-ns-100pods-cephrbd-ld │
│ time="2024-04-07T14:36:12Z" level=info msg="Async fs restore data path started" PodVolumeRestore=openshift-adp/restore-restic-datagen-single-ns-100pods-cep │
│ time="2024-04-07T14:36:13Z" level=info msg="Error cannot be convert to ExitError format." PodVolumeRestore=openshift-adp/restore-restic-datagen-single-ns-1 │
│ time="2024-04-07T14:36:13Z" level=info msg="Run command=restore, stdout=, stderr=" PodVolumeRestore=openshift-adp/restore-restic-datagen-single-ns-100pods- │
│ time="2024-04-07T14:36:13Z" level=error msg="Async fs restore data path failed" controller=PodVolumeRestore error="chdir /host_pods/cb26a3f3-1bcd-421d-a5ff │
│ time="2024-04-07T14:36:13Z" level=info msg="FileSystemBR is closed" PodVolumeRestore=openshift-adp/restore-restic-datagen-single-ns-100pods-cephrbd-ldw7j c │

 time="2024-04-07T14:36:09Z" level=info msg="Restore starting" PodVolumeRestore=openshift-adp/restore-restic-datagen-single-ns-100pods-cephrbd-gzf79 control │
│ time="2024-04-07T14:36:09Z" level=info msg="Got volume dir" PodVolumeRestore=openshift-adp/restore-restic-datagen-single-ns-100pods-cephrbd-gzf79 controlle │
│ time="2024-04-07T14:36:09Z" level=info msg="Found path matching glob" PodVolumeRestore=openshift-adp/restore-restic-datagen-single-ns-100pods-cephrbd-gzf79 │
│ time="2024-04-07T14:36:09Z" level=info msg="Founding existing repo" backupLocation=bucket logSource="/remote-source/velero/app/pkg/repository/ensurer.go:86 │
│ time="2024-04-07T14:36:10Z" level=info msg="FileSystemBR is initialized" PodVolumeRestore=openshift-adp/restore-restic-datagen-single-ns-100pods-cephrbd-gz │
│ time="2024-04-07T14:36:10Z" level=info msg="Async fs restore data path started" PodVolumeRestore=openshift-adp/restore-restic-datagen-single-ns-100pods-cep │
│ time="2024-04-07T14:36:10Z" level=info msg="Restore starting" PodVolumeRestore=openshift-adp/restore-restic-datagen-single-ns-100pods-cephrbd-pw6kj control │
│ time="2024-04-07T14:36:10Z" level=info msg="Restore starting" PodVolumeRestore=openshift-adp/restore-restic-datagen-single-ns-100pods-cephrbd-52c5q control │
│ time="2024-04-07T14:36:10Z" level=info msg="Restore starting" PodVolumeRestore=openshift-adp/restore-restic-datagen-single-ns-100pods-cephrbd-w8h7f control │
│ time="2024-04-07T14:36:10Z" level=info msg="Restore starting" PodVolumeRestore=openshift-adp/restore-restic-datagen-single-ns-100pods-cephrbd-zhtth control │
│ time="2024-04-07T14:36:10Z" level=info msg="Restore starting" PodVolumeRestore=openshift-adp/restore-restic-datagen-single-ns-100pods-cephrbd-jlc8x control │
│ time="2024-04-07T14:36:10Z" level=info msg="Restore starting" PodVolumeRestore=openshift-adp/restore-restic-datagen-single-ns-100pods-cephrbd-mg6w2 control │
│ time="2024-04-07T14:36:11Z" level=info msg="Error cannot be convert to ExitError format." PodVolumeRestore=openshift-adp/restore-restic-datagen-single-ns-1 │
│ time="2024-04-07T14:36:11Z" level=info msg="Run command=restore, stdout=, stderr=" PodVolumeRestore=openshift-adp/restore-restic-datagen-single-ns-100pods- │
│ time="2024-04-07T14:36:11Z" level=error msg="Async fs restore data path failed" controller=PodVolumeRestore error="chdir /host_pods/804a47c6-6eca-44c7-be3a │
│ time="2024-04-07T14:36:11Z" level=info msg="FileSystemBR is closed" PodVolumeRestore=openshift-adp/restore-restic-datagen-single-ns-100pods-cephrbd-gzf79 c │

this issue was reproduce on 3 different cluster on perf and scale team
also on latest OADP build 1.3.1-57

all the logs from the above cycle can be found here

https://drive.google.com/drive/folders/1WKXxlNuugmg_Dc-9k89waM-uIuK0dwLr?usp=sharing

Version-Release number of selected component (if applicable):

OADP 1.3.1-54

ODF 4.14.6
OCP 4.14.13

How reproducible:

Steps to Reproduce:
1. clone repo to your /tmp folder on machine which has access to openshift cluster git clone git@gitlab.cee.redhat.com:mlehrer/mpqe-scale-scripts.git
2. execute this ansible-playbook command 10 times:

ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-0 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-0 sc=ocs-storagecluster-ceph-rbd' -vvvv

ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-1 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-1 sc=ocs-storagecluster-ceph-rbd' -vvvv

ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-2 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-2 sc=ocs-storagecluster-ceph-rbd' -vvvv

ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-3 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-3 sc=ocs-storagecluster-ceph-rbd' -vvvv

ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-4 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-4 sc=ocs-storagecluster-ceph-rbd' -vvvv

ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-5 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-5 sc=ocs-storagecluster-ceph-rbd' -vvvv

ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-6 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-6 sc=ocs-storagecluster-ceph-rbd' -vvvv

ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-7 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-7 sc=ocs-storagecluster-ceph-rbd' -vvvv

ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-8 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-8 sc=ocs-storagecluster-ceph-rbd' -vvvv

ansible-playbook /tmp/mpqe-scale-scripts/mtc-helpers/data-generator/playbooks/playbook_case3.yml --extra-vars 'dir_count=30 files_count=230 files_size=307200 dept_count=1 pvc_size=6Gi deployment_name=deploy-perf-datagen-0-0-7-6gi-10-rbd-9 dataset_path=/opt/mounts/mnt1/ namespace=perf-datagen-case-0-0-7-cephrbd pvc_name=pvc-perf-datagen-0-0-7-6gi-10-rbd-9 sc=ocs-storagecluster-ceph-rbd' -vvvv

Above command will deploy 10 pods with 2GBs of utilized data in namespace perf-datagen-case3-rbd

3. Perform backup of perf-datagen-case3-rbd this will complete successfully.

4. Perform restore of perf-datagen-case3-rbd and it will show 1 or 2 successful podvolumerstore and then fail 1 pvr and wait until velero timeout expires.

Actual results:

1 or 2 PVR successful, and 1 Failed, other 7 or 8 PVR without status until timeout of velero - restore is unsuccessful.

Expected results:

Successful restore is possible in 1.3.1.-27 using velero - 1.12.3, failing in
1.3.1-54, 57, and 59

Additional info:

mentioned on

Merge request - fixes for OADP-3866 :

Assignee:: Wes Hayutin

Reporter:: Tzahi Ashkenazi

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2024/04/08 11:07 AM

Updated:: 2025/08/08 9:11 PM

Resolved:: 2024/04/11 1:41 AM

Details

Description

Description of problem:

PVR :

from velero log

all the logs from the above cycle can be found here

OADP 1.3.1-54

Actual results:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates