Uploaded image for project: 'OpenShift API for Data Protection'
  1. OpenShift API for Data Protection
  2. OADP-3105

Only the 1st PVR is updated with status when app deployed over NFS w/ filesystem Kopia or restic

XMLWordPrintable

    • 4
    • False
    • Hide

      None

      Show
      None
    • False
    • ToDo
    • 0
    • 0.000
    • Very Likely
    • 0
    • None
    • Unset
    • Unknown
    • No

      Description of problem:

      When restoring an application with more than 2 NFS PVCs

      only the 1st PVR is updated with status (completed)
      The next PVR remains without any status, which eventually causes the backup to remain in progress until it fails.
      It is observed that both  PVs + the PV data were restored, even though its PVR has no status and the restore file.
      please go through the reproduction steps carefully

       

      Version-Release number of selected component (if applicable):

      How reproducible:
      100% 

      Steps to Reproduce:

      1. Deploy external NFS with the following provisioner:

      kind: StorageClass
      apiVersion: storage.k8s.io/v1
      metadata:
        name: managed-nfs-storage
        uid: f3a53889-050c-4179-962d-bee510b01b74
        resourceVersion: '226271'
        creationTimestamp: '2023-11-21T12:30:09Z'
        annotations:
          storageclass.kubernetes.io/is-default-class: 'true'
       provisioner: k8s-sigs.io/nfs-subdir-external-provisioner
      parameters:
        onDelete: delete
        pathPattern: '${.PVC.namespace}/${.PVC.annotations.nfs.io/storage-path}'
      reclaimPolicy: Delete
      volumeBindingMode: Immediate 

       2. set the SC as default SC and deploy this busy box application

      $oc new-project busybox-2pvc-on-nfs
      apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        name: busybox-pvc1
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
      ---
      apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        name: busybox-pvc2
      spec:
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
      ---
      apiVersion: v1
      kind: Pod
      metadata:
        name: busybox
      spec:
        containers:
        - name: busybox
          image: busybox
          command: ["/bin/sh", "-ec", "while [ true ]; do sleep 3600; done"]
          volumeMounts:
          - name: busybox-volume1
            mountPath: /mnt/volume1
          - name: busybox-volume2
            mountPath: /mnt/volume2
        volumes:
        - name: busybox-volume1
          persistentVolumeClaim:
            claimName: busybox-pvc1
        - name: busybox-volume2
          persistentVolumeClaim:
            claimName: busybox-pvc2 

      3. touch a file on both volumes

      $ oc exec busybox -- touch /mnt/volume1/test-a
      Defaulted container "busybox" out of: busybox, restore-wait (init)
      $ oc exec busybox -- touch /mnt/volume2/test-b
      Defaulted container "busybox" out of: busybox, restore-wait (init)
      

      4. run backup fs w/ kopia as upload type
      dpa

      apiVersion: oadp.openshift.io/v1alpha1
      kind: DataProtectionApplication
      metadata:
        creationTimestamp: '2023-11-21T10:33:22Z'
        generation: 1
       ...
        name: ts-dpa
        namespace: openshift-adp
        resourceVersion: '96934'
        uid: 7e2991fb-9f4c-4321-8a51-3966cfe3cd2a
      spec:
        backupLocations:
          - velero:
              config:
                resourceGroup: oadp-62200-2v2qg-rg
                storageAccount: oadpbucket247692
                storageAccountKeyEnvVar: AZURE_STORAGE_ACCOUNT_ACCESS_KEY
                subscriptionId: 53b8f551-f0fc-4bea-8cba-6d1fefd54c8a
              credential:
                key: cloud
                name: cloud-credentials
              default: true
              objectStorage:
                bucket: oadpbucket247692
                prefix: velero-e2e-64c4680c-8859-11ee-89b6-f6e178f429ce
              provider: azure
        configuration:
          nodeAgent:
            enable: true
            podConfig:
              resourceAllocations:
                requests:
                  cpu: 100m
                  memory: 64Mi
            uploaderType: kopia
          velero:
            defaultPlugins:
              - openshift
              - azure
              - kubevirt
              - csi
            podConfig:
              resourceAllocations:
                requests:
                  cpu: 100m
                  memory: 64Mi
        podDnsConfig: {}
        snapshotLocations: []
      status:
        conditions:
          - lastTransitionTime: '2023-11-21T10:33:22Z'
            message: Reconcile complete
            reason: Complete
            status: 'True'
            type: Reconciled
       

      backup

      apiVersion: velero.io/v1
      kind: Backup
      metadata:
        ...
        name: 2pvconnfstest
        uid: 823135fd-932a-4bbb-88ca-0cec546c4042
        creationTimestamp: '2023-11-21T14:28:46Z'
        generation: 6
        ...
        namespace: openshift-adp
        labels:
          velero.io/storage-location: ts-dpa-1
      spec:
        csiSnapshotTimeout: 10m0s
        defaultVolumesToFsBackup: true
        includedNamespaces:
          -  busybox-2pvc-on-nfs
        itemOperationTimeout: 4h0m0s
        snapshotMoveData: false
        storageLocation: ts-dpa-1
        ttl: 720h0m0s
      status:
        completionTimestamp: '2023-11-21T14:29:58Z'
        expiration: '2023-12-21T14:28:46Z'
        formatVersion: 1.1.0
        phase: Completed
        progress:
          itemsBackedUp: 41
          totalItems: 41
        startTimestamp: '2023-11-21T14:28:46Z'
        version: 1
       

      5. Remove the application

      $ oc delete ns/busybox-2pvc-on-nfs 

       

      6. Restore

      apiVersion: velero.io/v1
      kind: Restore
      metadata:
        creationTimestamp: '2023-11-21T14:32:42Z'
        finalizers:
          - restores.velero.io/external-resources-finalizer
        generation: 3
        managedFields:
          - apiVersion: velero.io/v1
            fieldsType: FieldsV1
            fieldsV1:
              'f:spec':
                .: {}
                'f:backupName': {}
            manager: Mozilla
            operation: Update
            time: '2023-11-21T14:32:42Z'
          - apiVersion: velero.io/v1
            fieldsType: FieldsV1
            fieldsV1:
              'f:metadata':
                'f:finalizers':
                  .: {}
                  'v:"restores.velero.io/external-resources-finalizer"': {}
              'f:spec':
                'f:excludedResources': {}
                'f:itemOperationTimeout': {}
              'f:status':
                .: {}
                'f:phase': {}
                'f:progress':
                  .: {}
                  'f:itemsRestored': {}
                  'f:totalItems': {}
                'f:startTimestamp': {}
            manager: velero-server
            operation: Update
            time: '2023-11-21T14:33:50Z'
        name: 2pvconnfstest
        namespace: openshift-adp
        resourceVersion: '230628'
        uid: 6c21ed76-7db5-4198-a33b-82d07b479eb8
      spec:
        backupName: 2pvconnfstest
        excludedResources:
          - nodes
          - events
          - events.events.k8s.io
          - backups.velero.io
          - restores.velero.io
          - resticrepositories.velero.io
          - csinodes.storage.k8s.io
          - volumeattachments.storage.k8s.io
          - backuprepositories.velero.io
        itemOperationTimeout: 4h0m0s
      status:
        phase: InProgress
        progress:
          itemsRestored: 24
          totalItems: 24
        startTimestamp: '2023-11-21T14:33:48Z' 

      * as can be seen here the restore is still in InProgress

      7. Allot the 1st PVR to complete 

      apiVersion: velero.io/v1
      kind: PodVolumeRestore
      metadata:
        generateName: 2pvconnfstest-
        resourceVersion: '230699'
        name: 2pvconnfstest-8d9sz
        uid: 6507ef99-5f2f-4f9f-ba2f-09a64d4f1591
        creationTimestamp: '2023-11-21T14:33:50Z'
        generation: 3
        managedFields:
          - apiVersion: velero.io/v1
            fieldsType: FieldsV1
            fieldsV1:
              'f:metadata':
                'f:generateName': {}
                'f:labels':
                  .: {}
                  'f:velero.io/pod-uid': {}
                  'f:velero.io/pvc-uid': {}
                  'f:velero.io/restore-name': {}
                  'f:velero.io/restore-uid': {}
                'f:ownerReferences':
                  .: {}
                  'k:{"uid":"6c21ed76-7db5-4198-a33b-82d07b479eb8"}': {}
              'f:spec':
                .: {}
                'f:backupStorageLocation': {}
                'f:pod': {}
                'f:repoIdentifier': {}
                'f:snapshotID': {}
                'f:sourceNamespace': {}
                'f:uploaderType': {}
                'f:volume': {}
              'f:status':
                .: {}
                'f:progress': {}
            manager: velero-server
            operation: Update
            time: '2023-11-21T14:33:50Z'
          - apiVersion: velero.io/v1
            fieldsType: FieldsV1
            fieldsV1:
              'f:status':
                'f:completionTimestamp': {}
                'f:phase': {}
                'f:startTimestamp': {}
            manager: node-agent-server
            operation: Update
            time: '2023-11-21T14:33:58Z'
        namespace: openshift-adp
        ownerReferences:
          - apiVersion: velero.io/v1
            controller: true
            kind: Restore
            name: 2pvconnfstest
            uid: 6c21ed76-7db5-4198-a33b-82d07b479eb8
        labels:
          velero.io/pod-uid: e4580b3d-26bf-4593-988e-01e7f952af3a
          velero.io/pvc-uid: d384defc-dc5b-4724-b9b4-b9a1c2f6f28f
          velero.io/restore-name: 2pvconnfstest
          velero.io/restore-uid: 6c21ed76-7db5-4198-a33b-82d07b479eb8
      spec:
        backupStorageLocation: ts-dpa-1
        pod:
          kind: Pod
          name: busybox
          namespace: 2pvconnfstest
          uid: e4580b3d-26bf-4593-988e-01e7f952af3a
        repoIdentifier: >-
          azure:oadpbucket247692:/velero-e2e-64c4680c-8859-11ee-89b6-f6e178f429ce/restic/2pvconnfstest
        snapshotID: 16d062ae6323fcecf89230f49ea12ee0
        sourceNamespace: 2pvconnfstest
        uploaderType: kopia
        volume: busybox-volume2
      status:
        completionTimestamp: '2023-11-21T14:33:58Z'
        phase: Completed
        progress: {}
        startTimestamp: '2023-11-21T14:33:56Z' 

      8. Validate the data on the PVS - Note the data **was restored**  in both PVs

      $ oc exec busybox -- stat /mnt/volume1/test-a
      Defaulted container "busybox" out of: busybox, restore-wait (init)
        File: /mnt/volume1/test-a
        Size: 0               Blocks: 0          IO Block: 1048576 regular empty file
      Device: 1000d0h/1048784d        Inode: 859508      Links: 1
      Access: (0644/-rw-r--r--)  Uid: (1000780000/1000780000)   Gid: (    0/    root)
      Access: 2023-11-21 14:51:21.436942041 +0000
      Modify: 2023-11-21 14:51:21.436942041 +0000
      Change: 2023-11-21 14:51:21.436942041 +0000 

      Actual results:

      the 2nd PVC remains without status 

      apiVersion: velero.io/v1
      kind: PodVolumeRestore
      metadata:
        generateName: 2pvconnfstest-
        resourceVersion: '230636'
        name: 2pvconnfstest-t9sbw
        uid: e3bc882e-2f35-4830-81de-538895814925
        creationTimestamp: '2023-11-21T14:33:50Z'
        generation: 1
       ...
        ownerReferences:
          - apiVersion: velero.io/v1
            controller: true
            kind: Restore
            name: 2pvconnfstest
            uid: 6c21ed76-7db5-4198-a33b-82d07b479eb8
        labels:
          velero.io/pod-uid: e4580b3d-26bf-4593-988e-01e7f952af3a
          velero.io/pvc-uid: e00bbcfd-845c-4aa5-8233-3e7e015c1fa6
          velero.io/restore-name: 2pvconnfstest
          velero.io/restore-uid: 6c21ed76-7db5-4198-a33b-82d07b479eb8
      spec:
        backupStorageLocation: ts-dpa-1
        pod:
          kind: Pod
          name: busybox
          namespace: 2pvconnfstest
          uid: e4580b3d-26bf-4593-988e-01e7f952af3a
        repoIdentifier: >-
          azure:oadpbucket247692:/velero-e2e-64c4680c-8859-11ee-89b6-f6e178f429ce/restic/2pvconnfstest
        snapshotID: 1238f75cb4200312b7e545d716ba56b9
        sourceNamespace: 2pvconnfstest
        uploaderType: kopia
        volume: busybox-volume1
      status:
        progress: {} 

       

      Expected results:

      All PVR should be updated with proper status and the restore should be completed successfully

      Additional info:

      node agen logs, the only indication for the 1st PVR, nothing after that

      adp/nfs-sql volume=mysql-data1
      time="2023-11-21T14:13:06Z" level=info msg="Founding existing repo" backupLocation=ts-dpa-1 logSource="/remote-source/velero/app/pkg/repository/ensurer.go:86" repositoryType=kopia volumeNamespace=nfs-sql
      time="2023-11-21T14:13:07Z" level=warning msg="active indexes [xn0_08cfddd7ec4c011b0f75697d7aba07a0-sa3db3ee5395a61df122-c1 xn0_363df94abc57574078e268d17f842400-s7f0fe5a47db017dd122-c1 xn0_72f43384ee4cbb6a04eb5a7ede99b3e2-s4abb5bdc13e6e275122-c1 xn0_9585bd9d2b89f6b4b7fe1a83e757fe5a-s65e1cd915b3daf0d122-c1 xn0_e83e9b8845574cfde74f0c2cbbc491fe-s69cac49b849a8d33122-c1] deletion watermark 0001-01-01 00:00:00 +0000 UTC" PodVolumeRestore=openshift-adp/nfs-sql-d2x4g controller=PodVolumeRestore logModule=kopia/kopia/format logSource="/remote-source/velero/app/pkg/kopia/kopia_log.go:101" pod=nfs-sql/mysql-68d84d7c89-gmh88 restore=openshift-adp/nfs-sql sublevel=error
      time="2023-11-21T14:13:07Z" level=info msg="Opening backup repo" PodVolumeRestore=openshift-adp/nfs-sql-d2x4g controller=PodVolumeRestore logSource="/remote-source/velero/app/pkg/uploader/provider/kopia.go:78" pod=nfs-sql/mysql-68d84d7c89-gmh88 repoUID=9547a4aa-8482-4bf5-ad2a-26b3c98c4ce5 restore=openshift-adp/nfs-sql
      time="2023-11-21T14:13:07Z" level=warning msg="active indexes [xn0_08cfddd7ec4c011b0f75697d7aba07a0-sa3db3ee5395a61df122-c1 xn0_363df94abc57574078e268d17f842400-s7f0fe5a47db017dd122-c1 xn0_72f43384ee4cbb6a04eb5a7ede99b3e2-s4abb5bdc13e6e275122-c1 xn0_9585bd9d2b89f6b4b7fe1a83e757fe5a-s65e1cd915b3daf0d122-c1 xn0_e83e9b8845574cfde74f0c2cbbc491fe-s69cac49b849a8d33122-c1] deletion watermark 0001-01-01 00:00:00 +0000 UTC" PodVolumeRestore=openshift-adp/nfs-sql-d2x4g controller=PodVolumeRestore logModule=kopia/kopia/format logSource="/remote-source/velero/app/pkg/kopia/kopia_log.go:101" pod=nfs-sql/mysql-68d84d7c89-gmh88 restore=openshift-adp/nfs-sql sublevel=error
      time="2023-11-21T14:13:07Z" level=info msg="FileSystemBR is initialized" PodVolumeRestore=openshift-adp/nfs-sql-d2x4g bsl=ts-dpa-1 controller=PodVolumeRestore jobName=nfs-sql-d2x4g logSource="/remote-source/velero/app/pkg/datapath/file_system.go:110" pod=nfs-sql/mysql-68d84d7c89-gmh88 repository=kopia restore=openshift-adp/nfs-sql source namespace=nfs-sql uploader=kopia
      time="2023-11-21T14:13:07Z" level=info msg="Async fs restore data path started" PodVolumeRestore=openshift-adp/nfs-sql-d2x4g controller=PodVolumeRestore logSource="/remote-source/velero/app/pkg/controller/pod_volume_restore_controller.go:154" path="/host_pods/f9e882ba-1064-457d-b71a-466637d31561/volumes/kubernetes.io~nfs/pvc-66fefb47-811d-49cf-a9f5-11d205c99c08" pod=nfs-sql/mysql-68d84d7c89-gmh88 restore=openshift-adp/nfs-sql
      time="2023-11-21T14:13:07Z" level=info msg="Starting restore" PodVolumeRestore=openshift-adp/nfs-sql-d2x4g controller=PodVolumeRestore logSource="/remote-source/velero/app/pkg/uploader/provider/kopia.go:219" pod=nfs-sql/mysql-68d84d7c89-gmh88 restore=openshift-adp/nfs-sql snapshotID=0aeb9080fbe7e3b6b5de76f795179d1b volumePath="/host_pods/f9e882ba-1064-457d-b71a-466637d31561/volumes/kubernetes.io~nfs/pvc-66fefb47-811d-49cf-a9f5-11d205c99c08"
      time="2023-11-21T14:13:07Z" level=info msg="Restore starting" PodVolumeRestore=openshift-adp/nfs-sql-4pljn controller=PodVolumeRestore logSource="/remote-source/velero/app/pkg/controller/pod_volume_restore_controller.go:113" pod=nfs-sql/mysql-68d84d7c89-gmh88 restore=openshift-adp/nfs-sql
      time="2023-11-21T14:13:07Z" level=info msg="Start to restore..." PodVolumeRestore=openshift-adp/nfs-sql-d2x4g controller=PodVolumeRestore logSource="/remote-source/velero/app/pkg/uploader/kopia/snapshot.go:359" pod=nfs-sql/mysql-68d84d7c89-gmh88 restore=openshift-adp/nfs-sql snapshotID=0aeb9080fbe7e3b6b5de76f795179d1b volumePath="/host_pods/f9e882ba-1064-457d-b71a-466637d31561/volumes/kubernetes.io~nfs/pvc-66fefb47-811d-49cf-a9f5-11d205c99c08"
      time="2023-11-21T14:13:07Z" level=info msg="Restore from snapshot 0aeb9080fbe7e3b6b5de76f795179d1b, description Kopia Uploader, created time 2023-11-21 14:07:16.903349898 +0000 UTC, tags map[backup:nfs-sql backup-uid:79a0ec27-9761-4ebd-9eca-893cf6fe9fa0 ns:nfs-sql pod:mysql-68d84d7c89-gmh88 pod-uid:980365d1-3ed9-4e78-8993-9c47b9dd81d2 pvc-uid:a840f233-2662-4384-abe3-0bd614cef0c1 snapshot-requester:pod-volume-backup-restore snapshot-uploader:kopia volume:mysql-data1]" PodVolumeRestore=openshift-adp/nfs-sql-d2x4g controller=PodVolumeRestore logSource="/remote-source/velero/app/pkg/uploader/kopia/snapshot.go:368" pod=nfs-sql/mysql-68d84d7c89-gmh88 restore=openshift-adp/nfs-sql snapshotID=0aeb9080fbe7e3b6b5de76f795179d1b volumePath="/host_pods/f9e882ba-1064-457d-b71a-466637d31561/volumes/kubernetes.io~nfs/pvc-66fefb47-811d-49cf-a9f5-11d205c99c08"
      time="2023-11-21T14:13:23Z" level=info msg="Kopia restore finished, restore size 107855068, file count 107" PodVolumeRestore=openshift-adp/nfs-sql-d2x4g controller=PodVolumeRestore logSource="/remote-source/velero/app/pkg/uploader/provider/kopia.go:248" pod=nfs-sql/mysql-68d84d7c89-gmh88 restore=openshift-adp/nfs-sql snapshotID=0aeb9080fbe7e3b6b5de76f795179d1b volumePath="/host_pods/f9e882ba-1064-457d-b71a-466637d31561/volumes/kubernetes.io~nfs/pvc-66fefb47-811d-49cf-a9f5-11d205c99c08"
      time="2023-11-21T14:13:23Z" level=info msg="Async fs restore data path completed" PVR=nfs-sql-d2x4g controller=PodVolumeRestore logSource="/remote-source/velero/app/pkg/controller/pod_volume_restore_controller.go:266" pvr=nfs-sql-d2x4g
      time="2023-11-21T14:13:23Z" level=info msg="Action finished" PodVolumeRestore=openshift-adp/nfs-sql-d2x4g controller=PodVolumeRestore logSource="/remote-source/velero/app/pkg/uploader/provider/kopia.go:91" pod=nfs-sql/mysql-68d84d7c89-gmh88 restore=openshift-adp/nfs-sql
      time="2023-11-21T14:13:23Z" level=info msg="Restore completed" controller=PodVolumeRestore logSource="/remote-source/velero/app/pkg/controller/pod_volume_restore_controller.go:318" pvr=nfs-sql-d2x4g
      time="2023-11-21T14:13:23Z" level=info msg="FileSystemBR is closed" PodVolumeRestore=openshift-adp/nfs-sql-d2x4g controller=PodVolumeRestore logSource="/remote-source/velero/app/pkg/datapath/file_system.go:129" pod=nfs-sql/mysql-68d84d7c89-gmh88 restore=openshift-adp/nfs-sql user=nfs-sql-d2x4g 

              tkaovila@redhat.com Tiger Kaovilai
              amastbau Amos Mastbaum
              Amos Mastbaum Amos Mastbaum
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: